ggplot(data = my_data, mapping = aes(x = var1, y = var2)) +
geom_line()
Artwork by @allison_horst
Linegraphs show the relationship between 2 numerical variables.
The explanatory (x-axis) variable must be of sequential ordering.
When describing linegraphs…
geom_histogram()
because we are investigating a single variable.Histogram syntax in R:
There are 3 things we look and describe when inspecting a histogram:
shape (skew and modality)
center (mean or median)
spread (range, IQR, or standard deviation)
Not all distributions have a simple recognizable shape!
Type colors()
in the console to view all possible colors.
Which bin size is most appropriate and describe the distribution of penguin body mass.
https://northwestern-university.shinyapps.io/lec03_histogram/
Which bin size is most appropriate and describe the distribution of penguin flipper length.
Faceting is used to make the same plot for different subgroups of the dataset.
This is useful for comparing the same variable across different subgroups in the dataset.
facet_wrap(~var)
can be added on to ANY plot type (scatterplot, linegraph, histogram, boxplot, barplot)
Which of the following are correct?
a)
b)
c)
d)
Helpful guidelines:
Larger number of observations generally correspond to larger number of bins needed.
You will generally need to test several different number of bins to learn about the data and find an appropriate value.
Sturges Rule of Thumb for unimodal symmetric distributions: bins = 1 + 3.322*log(n)
Sturge’s rule is not great if the data is severely skewed, multi-modal, or for an extremely large number of observations. But it could give you a starting place and then you will want to increase the number of bins until you can properly see the shape.