Distributions
Section 9.1

The Standard Normal Distribution

Properties

The area under the curve is 1 (or 100%)
The mean of the distribution is 0
The standard deviation of the distribution is 1

Empirical Rule

Around 68% of values are within one standard deviation from the mean.
Around 95% of values are within two standard deviations from the mean.
Around 99.7% of values are within three standard deviations from the mean.

Why is this useful? If we know the mean and standard deviation of a variable that follows the normal distribution, we can calculate the probability of an event occurring.

How is this applicable? We can transform any normally distributed variable into a standard normal distribution with standardization.

\[\hbox{STAT} = \frac{X - mean(X)}{sd(X)}\]

The Normal Distribution

Changing $\mu$ (mu, the mean), changes where the center (peak) of the distribution is located.
Changing $\sigma$ (sigma, the standard deviation), changes the spread of the distribution.
The shape of the distribution never changes. It is always unimodal and symmetric (no skew).

The T-Distribution

The t-distribution is similar in shape to a normal distribution.

Completely characterized by it’s degrees of freedom (df). A parameter defined based on the sample size $n$.
As we make the degrees of freedom ($df$) larger the t-distribution is getting closer to the standard normal distribution.
The normal distribution assumes that you know the population standard deviation, $\sigma$. The t-distribution is used if you only know the sample standard deviation, $s$ (ie: $\sigma$ unknown).

Normal Distribution in R

pnorm() calculates the probability to the left of a quantile

# Calculate the probability of being to the left of quantile -1
pnorm(q = -1, mean = 0, sd = 1)

qnorm() calculates the quantile with p% of data to the left

# Calculate the quantile, with 20% of data to the left
# default is mean = 0 and sd = 1
qnorm(p = .2)

Setting lower.tail = FALSE calculates/uses area to the right

# lower.tail = FALSE changes it to data/area to the right
# specify mean and sd if you are not using a standard normal distribution
qnorm(p = .025, mean = 10, sd = 2, lower.tail = FALSE)

Example 1

For a standard normal distribution, what is the probability of being less than one standard deviation below the mean?

pnorm(q = -1)

[1] 0.1586553

Example 2

For a standard normal distribution, find the STAT (CV) for being in the highest 30%.

qnorm(p = 0.3, lower.tail = FALSE)

[1] 0.5244005

# OR

qnorm(p = 0.7)

[1] 0.5244005

Example 3

The amount of money spent buying weekly groceries follows a normal distribution. We are lucky enough to know the population mean is $150 and the population standard deviation is $20. Find the probability an individual spent less than $120.

Using data
Standardize

# Standardize
pnorm(q = 120, mean = 150, sd = 20)

[1] 0.0668072

# Standardize
pnorm(q = -1.5)

[1] 0.0668072

You can transform any normally distributed variable onto a standardized scale.

T-distribution in R

VERY similar to normal distribution but…

pt() calculates the probability to the left of a quantile for a t-distribution

qt() calculates the quantile with p% of data to the left for a t-distribution

# no mean and sd! Now we use df
# these examples are using 9 degrees or freedom
pt(q = -1, df = 9)
qt(p = .025, df = 9)
qt(p = .025, df = 9, lower.tail = FALSE)

Distributions Section 9.1