Populations and Generalizability

Activity 15

Author

Solution

Note

This activity is to be completed using this app. The Data Overview and Population tab are just informative so that you can have a better idea of the data you are working with. The activity will use the Simulation tab.

It may be easier to complete this worksheet in Visual mode since there are several tables to fill in.

Simulation

Population vs My Sample

For “Method” choose “Simulation”.

Set the seed to your Net ID. This will allow for consistent results every time you click run.

Run 1 repetition with a sample size of 15. Report the population parameters and your sample statistics by switching the Variable and Sample Statistic of interest. (This “histogram” may seem weird because we are only plotting one observation)

Price mean Price variance Duration mean Duration variance Wait time mean Wait time variance Slope Intercept
Population 19.50 20.81 30.11 97.59 4.51 9.10 0.45 5.94
Sample 20.63 20.34 30.87 92.80 4.81 6.36 0.46 5.95

Sample values will be different for everyone.

Are your sample values close to the parameter values?

The means are all relatively close to the population parameters. The variances tend to be slightly underestimated.


Class Data Sample

For “Method” choose “Class Data” and enter your Net ID.

  • Choose a sample size of 15.
  • Report the overall mean for each sample statistic (ie: our class sample average) by switching the Variable and Sample Statistic of interest.
Price mean Price variance Duration mean Duration variance Wait time mean Wait time variance Slope Intercept
Class sample 19.55 20.92 30.24 98.96 4.42 8.67 0.45 5.98

Sample values here may not match as this solution is likely from a previous quarter. Your values will match your fellow classmates.

How many observations are plotted in the histogram?

Check number of entries at bottom of table (mine is 148 but may be different as this solution is likely from a previous quarter)

What do you notice about the means of these sample statistics (Hint: compare them to the parameter values from your first table)?

All the means of the sample statistics are fairly close to their respective parameters.

Inspect the histogram of the following variables and statistics. What do you notice about these distributions? Where are they centering? What about their shape? Did your sample happen to fall close to the truth (population) or far?

  • sample means of price:

    • The distribution appears to be centering around the respective parameter value 19.50.

    • The distribution is unimodal and symmetric.

    • My sample was a bit away and was an overestimate

  • sample variances of price:

    • The distribution appears to be centering around the respective parameter value 20.8.

    • The distribution is unimodal and right skewed.

    • My sample happened to be very close to the population

  • sample estimates of slope coefficient

    • The distribution appears to be centering around the respective parameter value 0.45.

    • The distribution is unimodal and symmetric.

    • My sample happened to be quite close to the population

Simulate MANY samples

Under “Method” choose “Simulation”, set the seed to your Net ID.

Run 10,000 repetition with a sample size of 15.

Report the sample statistics by switching the Variable and Sample Statistic of interest.

Price mean Price variance Duration mean Duration variance Wait time mean Wait time variance Slope Intercept
Sample 19.50 20.84 30.10 97.70 4.51 9.18 0.45 5.93

What do you notice about the means of these sample statistics (Hint: compare them to the parameter values from your first table)?

The values are VERY close to their respective parameter values.

Inspect the histogram of the following variables and statistics. What do you notice about these distributions? Where are they centering? What about their shape?

Similar to what we saw with the class sample statistics, each distribution appears to be centering around their respective parameter value. Shape is easier to examine in this case because we have 20,000 data points instead of only ~150 data points.

  • sample means of price: The sampling distribution of the sample means of price looks to be centering around $19.50. The distribution is unimodal and symmetric. Resembles NORMAL distribution.

  • sample variances of price: The sampling distribution of the sample variance of price looks to be centering around $20.80. The distribution is unimodal and right skewed. Resembles CHI SQUARE distribution

  • sample estimates of slope coefficient The sampling distribution of the sample slope estimates looks to be centering around 0.45. The distribution is unimodal and symmetric. Resembles NORMAL distribution.

Population

Click on the population tab. Look at the distribution for the variable price (this is census data for ride share price). How does this population histogram compare to the histogram of sample means of price (i.e. sampling distribution with 10,000 repeated samples)? Can you explain the difference? (Might help to flip back and forth between the Simulation tab with 10,000 repetitions of sample size 15 and the Population tab).

The distributions have the same shape (unimodal, no skew) and center ($19.50). The DIFFERENCE is that the spread of the sampling distribution is less than the spread of the distribution of price.

Each data point/observation in the distribution of price represents ONE ride.

Each data point/observation in the sampling distribution represents a mean of 15 randomly selected ride prices.

When we take the mean of values the extremes work to cancel each other out, resulting in less spread in means when compared to spread of individual values.