Overview

Activity 19 begins our discussion of hypothesis testing which is a framework for using statistics to make decisions. Today we will explore how a hypothesis test is affected by the claimed null value’s relationship to the truth. Put another way, we are exploring what happens if the truth is different from the status quo (null hypothesis).


Needed Packages

The following loads the packages that are needed for this activity.

# load packages
library(tidyverse)
library(skimr)
library(TeachingDemos)


When conducting a hypothesis test you make a decision to either reject the null or fail to reject the null based on a pre-specified significance level (often 0.05). There are 3 possible outcomes with your decision: 1. you made the correct decision 2. you rejected the null when the null turns out to be true (Type I error) 3. you failed to reject the null when the null turns out to be false (Type II error)

Question 1

Let’s revisit our ride share data (ride_data) that we’ve been using as our population data in order to demonstrate how we assess statistical estimators like the sample mean \(\bar{x}\) — think unbiasedness and precision (standard error). Load the data and calculate the mean, \(\mu\), and standard deviation, \(\sigma\), for the variable price. Note that these are PARAMETER values.

# load data
ride_data <- read_rds("data/ride_data.rds")
# parameter values
ride_data %>%
  summarize(mean = mean(price))
## # A tibble: 1 × 1
##    mean
##   <dbl>
## 1  19.5


We will want to refer back to these values later and see how our calculations are affected. In real life we won’t be able to do this, but through this process we should be able to build a better understanding of hypothesis tests.

Caution: In general, the data should only be used to perform one hypothesis test. If you want to test a new hypothesis about the same parameter, then you should collect new data to test the new hypothesis.


Q1: Hypothesis Test 1

Suppose we are an analyst for a large investment firm and we are considering investing in this ride share company. They claim the mean price of a ride is $19.50.

We think something is a little fishy with this claim. We don’t believe that the mean price of a ride is $19.50. We think it is something different. We decide to conduct a hypothesis test. We begin by specifying our null and alternative hypotheses:

\[H_0: \mu_{price} = 19.50\] \[H_A: \mu_{price} \ne 19.50\]

Next we need to pre-specify the threshold we will use to make our decision. We are setting a threshold for deciding if the p-value is small enough. (ie: If the data provides sufficient evidence or not to reject the null hypothesis.)

This threshold, (\(\alpha\)) is also the probability of making what is called a type I error. Therefore when setting this threshold we are deciding how much tolerance do we have for making this type of error. The probability we decided to reject the null hypothesis when in fact it was true.

For this problem lets set the threshold of \(\alpha = 0.05\).

Next we will need to collect some sample data.

Replace NETID with your Net ID ex:ABC1234. This will make your random sample the same every time you knit the document. This is important so that when you knit the results your conclusions remain the same and your data doesn’t change on you.

set.seed(char2seed("ABC123"))
# get a random sample of 100 from the ride_data
my_sample <- ride_data %>% 
  sample_n(100)

# print/inspect data
my_sample
## # A tibble: 100 × 3
##    price duration wait_time
##    <dbl>    <dbl>     <dbl>
##  1  17.1     28.8      3.68
##  2  26.6     45.1      5.12
##  3  16.6     22.3     13.1 
##  4  17.8     24.3      8.42
##  5  16.6     25.5     11.4 
##  6  23.3     38.7      4.19
##  7  18.4     28.5      7.57
##  8  19.0     29.1      4.81
##  9  14.2     17.3      1.51
## 10  26.1     46.8      1.66
## # … with 90 more rows
# sample statistics
my_sample %>% 
  skim_without_charts(price)
Data summary
Name Piped data
Number of rows 100
Number of columns 3
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
price 0 1 19.21 4.42 7.83 16.37 18.78 22.7 30.3


Now that we have the sample data we can calculate a p-value and compare it to our threshold to make a decision. Calculate the p-value for your data using t.test.

# calculate p-value
t.test(x = my_sample$price, mu = 19.5)
## 
##  One Sample t-test
## 
## data:  my_sample$price
## t = -0.64788, df = 99, p-value = 0.5186
## alternative hypothesis: true mean is not equal to 19.5
## 95 percent confidence interval:
##  18.33606 20.09094
## sample estimates:
## mean of x 
##   19.2135


Make your decision and interpret it in the context of our problem.

p-value = 0.5

p-value >= 0.05, therefore fail to reject the null hypothesis

At a significance level of 0.05, we fail to reject the null hypothesis since our p-value is greater at 0.5. We have insufficient evidence to reject that the mean price of rides for this company is $19.50.


Your decision may or may not agree with your neighbor. Why is that the case?

We each have a RANDOM sample. That is, we have different data. Therefore we have different p-values.


Let’s take a quick poll of the room to see how many had enough evidence to reject the null. That is, how many made a type I error.

What do you observe?

Only a few people rejected the null when the shouldn’t have (made a type I error).


We know that the company is telling the truth and that the mean is really $19.50. Knowing that and the fact that there are about 100 students in the class. About how many students do we expect to make a type I error?

We would expect about 5 out of 100 to make a type I error \((\alpha = 0.05)\).


Q2: Hypothesis Test 2

Let’s go back in time and slightly change this scenario. The only thing we will alter is what the company is claiming about the mean price of their rides.

Suppose we are analyst for a large investment firm and we are considering investing in this ride share company. They are claim the mean price of a ride is $21.

We think something is a little fishy with this claim. We don’t believe that the mean price of a ride is $21. We think it is something different. We decide to conduct a hypothesis test. We begin by specifying our null and alternative hypotheses:

\[H_0: \mu_{price} = 21\] \[H_A: \mu_{price} \ne 21\]

We pre-specify our threshold to be 0.05. We would then need to collect our data (have already loaded it in scenario 1 so no need to load it again).

Now calculate a p-value and compare it to our threshold to make a decision. Calculate the p-value for your data.


# calculate p-value
t.test(x = my_sample$price, mu = 21)
## 
##  One Sample t-test
## 
## data:  my_sample$price
## t = -4.0399, df = 99, p-value = 0.0001057
## alternative hypothesis: true mean is not equal to 21
## 95 percent confidence interval:
##  18.33606 20.09094
## sample estimates:
## mean of x 
##   19.2135

What is your decision and interpret it in the context of our problem.

p-value = 0.0001

p-value < 0.05, therefore reject the null hypothesis

At a significance level of 0.05, we have sufficient evidence to reject that the mean price of rides for this company is $21.

We know that they are lying to us because the alternative hypothesis is true in this case (population mean is 19.50 not 21). Were you able to detect that they were lying to you? That is, did you have significant evidence to reject the null hypothesis? Or did you make a type II error?

We rejected the null when in fact the alternative was true. We were able to detect that they were lying. (Remember in a real life situation we will not know the truth!)


Let’s take a quick poll of the room to see how many had enough evidence to reject the null in this case or alternatively how many made a type II error.

What do you observe?

About 90% of the students correctly rejected the null. The other 10% were making a type II error (false negative).


Scenario 3

Let’s go back in time and slightly change scenario 1 again. The only thing we will alter is what the company is claiming about the mean price of their rides.

Let’s up their claim to $24. Therefore we have

\[H_0: \mu_{price} = 24\] \[H_A: \mu_{price} \ne 24\]

Use a pre-specified threshold of 0.05 again. Collect the data (have already loaded it in scenario 1 so no need to load it again). Calculate the p-value and make your decision.

# calculate p-value
t.test(x = my_sample$price, mu = 24)
## 
##  One Sample t-test
## 
## data:  my_sample$price
## t = -10.824, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 24
## 95 percent confidence interval:
##  18.33606 20.09094
## sample estimates:
## mean of x 
##   19.2135


Make your decision and interpret it in the context of our problem.

p-value = <0.0000000000000002

p-value < 0.05, therefore reject the null hypothesis

At a significance level of 0.05, we have sufficient evidence to suggest that the mean price for rides for this company is not $24. — I am making a correct decision to reject here.


We know that they are lying to us. We know that the alternative hypothesis is true. Were you able to detect that they were lying to you? That is, did you have significant evidence to reject the null hypothesis? Or did you make a type II error?


Yes, I was able to detect the lie.

Let’s take a quick poll of the room to see how many had enough evidence to reject the null in this scenario or alternatively how many made a type II error.

What do you observe?

About all of the students correctly rejected the null. Those that failed to reject are making a type II error (false negative).


Considering this trend what would happen if the lie kept getting bigger (farther away from the truth)?

Higher percentage of tests will be able to detect the lie. Meaning type II error will decrease.


In general what does this tell you about hypothesis tests? Specifically, as the truth gets farther away from the status quo (null hypothesis) is it easier or harder to reject the null hypothesis (their claim)?

It becomes easier to reject the null when the null value gets farther away from the truth. The test becomes more powerful.

Further Discussion/Notes

Consider sample size. Suppose we had the same sample mean and standard deviaton and all that was different was that the sample size was smaller, then it would result in a larger p-value. That is, the lower the sample size the harder it is to detect a statistically significant differene of any size. Conversely as the sample size gets larger it becomes easier to detect smaller and smaller differences as statistically significant. Remember just because something is statistically significant doesn’t mean it is practically significant.