Around 68% of values are within one standard deviation from the mean.
Around 95% of values are within two standard deviations from the mean.
Around 99.7% of values are within three standard deviations from the mean.
Why is this useful? If we know the mean and standard deviation of a variable that follows the normal distribution, we can calculate the probability of an event occurring.
How is this applicable? We can transform any normally distributed variable into a standard normal distribution with standardization.
\[\hbox{STAT} = \frac{X - mean(X)}{sd(X)}\]
The Normal Distribution
Changing \(\mu\) (mu, the mean), changes where the center (peak) of the distribution is located.
Changing \(\sigma\) (sigma, the standard deviation), changes the spread of the distribution.
The shape of the distribution never changes. It is always unimodal and symmetric (no skew).
The T-Distribution
The t-distribution is similar in shape to a normal distribution.
Completely characterized by it’s degrees of freedom (df). A parameter defined based on the sample size \(n\).
As we make the degrees of freedom (\(df\)) larger the t-distribution is getting closer to the standard normal distribution.
The normal distribution assumes that you know the population standard deviation, \(\sigma\). The t-distribution is used if you only know the sample standard deviation, \(s\) (ie: \(\sigma\) unknown).
Normal Distribution in R
pnorm() calculates the probability to the left of a quantile
# Calculate the probability of being to the left of quantile -1pnorm(q =-1, mean =0, sd =1)
qnorm() calculates the quantile with p% of data to the left
# Calculate the quantile, with 20% of data to the left# default is mean = 0 and sd = 1qnorm(p = .2)
Setting lower.tail = FALSE calculates/uses area to the right
# lower.tail = FALSE changes it to data/area to the right# specify mean and sd if you are not using a standard normal distributionqnorm(p = .025, mean =10, sd =2, lower.tail =FALSE)
Example 1
For a standard normal distribution, what is the probability of being less than one standard deviation below the mean?
pnorm(q =-1)
[1] 0.1586553
Example 2
For a standard normal distribution, find the STAT (CV) for being in the highest 30%.
qnorm(p =0.3, lower.tail =FALSE)
[1] 0.5244005
# ORqnorm(p =0.7)
[1] 0.5244005
Example 3
The amount of money spent buying weekly groceries follows a normal distribution. We are lucky enough to know the population mean is $150 and the population standard deviation is $20. Find the probability an individual spent less than $120.
You can transform any normally distributed variable onto a standardized scale.
T-distribution in R
VERY similar to normal distribution but…
pt() calculates the probability to the left of a quantile for a t-distribution
qt() calculates the quantile with p% of data to the left for a t-distribution
# no mean and sd! Now we use df# these examples are using 9 degrees or freedompt(q =-1, df =9)qt(p = .025, df =9)qt(p = .025, df =9, lower.tail =FALSE)