Sampling Distributions

Activity 16

Author

Solutions

Note

This activity is to be completed using this app. The Data Overview provides a background of the dataset you are working with. The activity will use the Population and Simulation tabs.

It is recommended that you complete this worksheet in Visual mode since there are several tables to fill in.

Sampling distribution and sample size

In general, how should a random sampling distribution relate to the population distribution?

The distribution for a random sample of observations from a population should be “representative” of the population distribution. That is, the random sample distribution should reflect or be similar to the population distribution.

Suppose we had a choice of taking a sample of size 10 or 300. Which one would you choose? Why?

Should pick a sample size of 300 because more data/information will provide us with a better picture of what is going on in the population (we can be more certain).


Duration

Provide a description for the distribution of ‘duration’ population.

The population is unimodal, symmetric, centered around 30.11 and spread of 9.88 in terms of standard deviation.

Let’s explore what happens to the sampling distribution of \(\bar{x}\) as we change our sample size. The number of repetitions will be 10,000. The images below were simulated using a sample size of either 5, 30 or 100. Label the images with the appropriate sample size and report your simulated mean and standard errors.

These are random samples so your mean and SE may vary.

n 100 5 30
mean 30.12 30.09 30.11
standard error 0.98 4.41 1.81

How would the distribution of 10,000 repetitions of size 500 compare to the simulations above (should not have to simulate this to answer this question)?

The mean would be close to the population mean of 30.11 and the standard error would be smaller than any of the simulations above.


Wait time

Provide a description for the distribution of ‘wait time’ population.

The population is unimodal and right skewed, centered around 4.51 in terms of mean and spread of 3.02 in terms of standard deviation.

Let’s explore what happens to the sampling distribution of \(\bar{x}\) as we change our sample size. The number of repetitions will be 10,000. The images below were simulated using a sample size of either 5, 30 or 100. Label the images with the appropriate sample size and report your simulated mean and standard errors.

These are random samples so your mean and SE may vary.

n 5 100 30
mean 4.5 4.51 4.51
standard error 1.34 0.3 0.55

How would the distribution of 10,000 repetitions of size 1 compare to the simulations above (should not have to simulate this to answer this question)?

A distribution of 10,000 repetitions of size 1 is equivalent to taking a single random sample of size 10,000 from the population. Assuming this is a random sample, it would be representative of the distribution of the population. The spread would be larger than the repeated samples of size 5.


Central Limit Theorem (CLT)

The CLT tells us that for a sufficient sample size, the distribution of the sample means will be approximately normally distributed even if the population distribution is not normal! More specifically, for a variable X with mean \(\mu_x\) and standard deviation \(\sigma_x\) we have \(\bar{x} \sim N(\mu_{\bar{x}} = \mu_x, SE = \frac{\sigma}{\sqrt{n}})\)

Sampling Mean Check

What do you observe for the sampling distribution of the mean for ‘wait time’ as the sample size gets larger? What about for ‘duration’? Does the CLT hold true?

The distribution centers around the population parameter! Yes the CLT holds no matter the shape of the original distribution for a sample mean.

Standard Error Check

Using the σ’s calculated and the standard error formula from the CLT (see Table 9.6 as well), calculate the theoretical standard errors and compare them to the simulated standard errors (s) for the three sampling distributions for each variable. Are the theoretical and simulated values close?

# Duration theoretical standard errors 
9.88/sqrt(5)
[1] 4.41847
9.88/sqrt(30)
[1] 1.803833
9.88/sqrt(100)
[1] 0.988
# Wait time theoretical standard errors 
3.02/sqrt(5)
[1] 1.350585
3.02/sqrt(30)
[1] 0.551374
3.02/sqrt(100)
[1] 0.302

Big Picture

Explain how the sampling distribution is related to a single sample.

A single sample is represented by a single data point in the sampling distribution.

For example, a single sample produces one sample mean which is one data point/observation in the sampling distribution (of means).

Explain how the sampling distribution is related to the population distribution.

The sampling distribution provides a way to evaluate an estimator (estimation procedure) for a population parameter (a numerical property of the population distribution). For example, consider the population mean, μ, which a measure of center for the population distribution. The sample mean, x¯, is an estimator for the population mean, μ. The sampling distribution allows us to determine is x¯ is unbiased and measure how precise it is. If the mean of the sampling distribution is equal to the population mean, then it is unbiased (on target). The standard error, standard deviation of the sampling distribution, allows us to determine how close/precise the estimator is to its mean/target.


Wait time under 5 minutes

The central limit theorem also applies to proportions. What is the population proportion (p) for ‘wait time under5’?

p = 0.69

The images below were simulated using a sample size of either 5, 30 or 100. Label the images with the sample size, mean and standard errors.

These are random samples so your mean and SE may vary.

n 30 100 5
mean 0.69 0.69 0.69
standard error 0.08 0.05 0.21

Using the population proportion and the standard error formula from Table 9.6, calculate the theoretical standard errors and compare them to the simulated standard errors (se) for each sampling distribution

# proportion theoretical standard errors 
sqrt(0.69*(1-0.69)/5)
[1] 0.2068333
sqrt(0.69*(1-0.69)/30)
[1] 0.08443933
sqrt(0.69*(1-0.69)/100)
[1] 0.04624932