In this worksheet, we’ll be looking at some erroneous plots and fixing them.
I think you might not have these two packages installed:
install.packages("ggridges")
install.packages("scales")
library(tidyverse)
library(gapminder)
library(ggridges)
library(scales)
After fixing the error, fix the overlapping problem in the following plot (attribution: “R for data science”).
ggplot(mpg, aes(cty, hwy)) +
geom_jitter(alpha = 0.5, size = 1) +
geom_smooth(method = "lm") +
theme_bw()
Fix this plot so that it shows life expectancy over time for each country. Notice that ggplot2
ignores the grouping of a tibble!
gapminder %>%
#group_by(country) %>%
ggplot(aes(year, lifeExp, group = country, colour = country == "Rwanda")) +
geom_line(alpha = 0.5) +
scale_colour_discrete("", labels = c("Other", "Rwanda"))
scales::comma_format()
.ggplot(gapminder, aes(gdpPercap, lifeExp)) +
geom_point(alpha = 0.2) +
scale_x_log10(labels = scales::comma_format()) +
facet_wrap(~ continent, scales = "free_y")
scale_size_area()
layer too (could also try scale_radius()
).shape=21
to distinguish between fill
(interior) and colour
(exterior).gapminder %>%
filter(continent != "Oceania") %>%
ggplot(aes(gdpPercap, lifeExp, size = pop, fill = continent)) +
facet_wrap(~ continent, nrow = 1) +
geom_point(alpha = 1, shape = 21) +
scale_x_log10(labels = scales::comma_format()) +
scale_size_area()
A list of shapes can be found at the bottom of the scale_shape
documentation.
Instead of alpha transparency, suppose you’re wanting to fix the overplotting issue by plotting small points. Why is this not working? Fix it.
ggplot(gapminder) +
geom_point(aes(gdpPercap, lifeExp, size = 0.1)) +
scale_x_log10(labels = scales::dollar_format())
The following mock data set marks the (x,y) position of a caribou at four time points.
arrow = arrow()
.time
label with geom_text()
.tribble(
~time, ~x, ~y,
1, 0.3, 0.3,
2, 0.8, 0.7,
3, 0.5, 0.9,
4, 0.4, 0.5
) %>%
ggplot(aes(x, y)) +
geom_line()
Fix the plot so that you can actually see the data points. Be sure to solve the problem of overlapping text, without rotating the text.
gapminder %>%
filter(continent == "Americas") %>%
ggplot(aes(country, lifeExp)) +
geom_point() +
geom_boxplot()
We’re starting with the same plot as above, but instead of the points + boxplot, try a ridge plot instead using ggridges::geom_density_ridges()
, and adjust the bandwidth
.
gapminder %>%
filter(continent == "Americas") %>%
ggplot(aes(country, lifeExp)) +
geom_point() +
geom_boxplot()
gapminder %>%
filter(continent == "Americas") %>%
ggplot(aes(lifeExp, country)) +
ggridges::geom_density_ridges()
## Picking joint bandwidth of 3.63
mtcars %>%
mutate(transmission = if_else(am == 0, "automatic", "manual")) %>%
ggplot(aes(cyl)) +
geom_bar(aes(colour = transmission))
Here’s the number of people having a certain hair colour from a sample of 592 people:
(hair <- as_tibble(HairEyeColor) %>%
count(Hair, wt = n))
## # A tibble: 4 x 2
## Hair n
## <chr> <dbl>
## 1 Black 108
## 2 Blond 127
## 3 Brown 286
## 4 Red 71
Fix the following bar plot so that it shows these counts.
ggplot(hair, aes(Hair, n)) +
geom_bar()
## Error: stat_count() must not be used with a y aesthetic.
Here’s the number of people having a certain hair and eye colour from a sample of 592 people:
(hair_eye <- as_tibble(HairEyeColor) %>%
count(Hair, Eye, wt = n))
## # A tibble: 16 x 3
## Hair Eye n
## <chr> <chr> <dbl>
## 1 Black Blue 20
## 2 Black Brown 68
## 3 Black Green 5
## 4 Black Hazel 15
## 5 Blond Blue 94
## 6 Blond Brown 7
## 7 Blond Green 16
## 8 Blond Hazel 10
## 9 Brown Blue 84
## 10 Brown Brown 119
## 11 Brown Green 29
## 12 Brown Hazel 54
## 13 Red Blue 17
## 14 Red Brown 26
## 15 Red Green 14
## 16 Red Hazel 14
Fix the following plot so that it shows a filled-in square for each combination.
ggplot(hair_eye, aes(Hair, Eye)) +
geom_point(aes(colour = n))
By the way, geom_count()
is like geom_bar()
: it counts the number of overlapping points.
If you’d like some practice, give these exercises a try
Exercise 1: Make a plot of year
(x) vs lifeExp
(y), with points coloured by continent. Then, to that same plot, fit a straight regression line to each continent, without the error bars. If you can, try piping the data frame into the ggplot()
function.
Exercise 2: Repeat Exercise 1, but switch the regression line and geom_point layers. How is this plot different from that of Exercise 1?
Exercise 3: Omit the geom_point()
layer from either of the above two plots (it doesn’t matter which). Does the line still show up, even though the data aren’t shown? Why or why not?
Exercise 4: Make a plot of year
(x) vs lifeExp
(y), facetted by continent. Then, fit a smoother through the data for each continent, without the error bars. Choose a span that you feel is appropriate.
Exercise 5: Plot the population over time (year) using lines, so that each country has its own line. Colour by gdpPercap
. Add alpha transparency to your liking.
Exercise 6: Add points to the plot in Exercise 5.