Mid-term evaluation (voluntary, anonymous, ~ 10 min)
3 / 12
will try to put lab before Thursday noon
Friday office hour is free for asking any questions (lab, HW, lectures, projects)
HW2 will be posted soon
HW1 will be graded by Questions 1 and 2
rm(list = ls()) # clean-up workspace
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.1 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Figure title should be descriptive:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(title = "Fuel efficiency generally decreases with engine size")
adds additional detail in a smaller font beneath the title.
adds text at the bottom right of the plot, often used to describe the source of the data.
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
title = "Fuel efficiency generally decreases with engine size",
subtitle = "Two seaters (sports cars) are an exception because of their light weight",
caption = "Data from fueleconomy.gov"
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_smooth(se = FALSE) +
x = "Engine displacement (L)",
y = "Highway fuel economy (mpg)"
read about available options in ?plotmath
df <- tibble(x = runif(10), y = runif(10))
ggplot(df, aes(x, y)) + geom_point() +
x = quote(sum(x[i] ^ 2, i == 1, n)),
y = quote(alpha + beta + frac(delta, theta))
Find the most fuel efficient car in each car class:
best_in_class <- mpg %>%
group_by(class) %>%
filter(row_number(desc(hwy)) == 1)
# equivalent as
# best_in_class <- filter(group_by(mpg, class), row_number(desc(hwy)) == 1)
## # A tibble: 7 x 11
## # Groups: class [7]
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 chevrolet corvette 5.7 1999 8 manua… r 16 26 p 2seat…
## 2 dodge caravan … 2.4 1999 4 auto(… f 18 24 r miniv…
## 3 nissan altima 2.5 2008 4 manua… f 23 32 r midsi…
## 4 subaru forester… 2.5 2008 4 manua… 4 20 27 r suv
## 5 toyota toyota t… 2.7 2008 4 manua… 4 17 22 r pickup
## 6 volkswagen jetta 1.9 1999 4 manua… f 33 44 d compa…
## 7 volkswagen new beet… 1.9 1999 4 manua… f 35 44 d subco…
function transforms a vector into a format that will be sorted in descending order
function subsets a data frame, retaining all rows that satisfy your conditions
draws a rectangle behind the textggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_label(aes(label = model), data = best_in_class, nudge_y = 2, alpha = 0.5)
package automatically adjust labels so that they don’t overlap:
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_point(size = 3, shape = 1, data = best_in_class) +
ggrepel::geom_label_repel(aes(label = model), data = best_in_class)
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class))
automatically adds scales
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
scale_x_continuous() +
scale_y_continuous() +
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_y_continuous(breaks = seq(15, 40, by = 5))
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_x_continuous(labels = NULL) +
scale_y_continuous(labels = NULL)
Plot y-axis at log scale:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
Plot x-axis in reverse order:
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
ColorBrewer scales are documentd online at http://colorbrewer2.org/
Available via RColorBrewer package
to use predefined mapping between values and colorspresidential %>%
mutate(id = 33 + row_number()) %>%
ggplot(aes(start, id, colour = party)) +
geom_point() +
geom_segment(aes(xend = end, yend = id)) +
scale_colour_manual(values = c(Republican = "red", Democratic = "blue"))
use scale_colour_gradient()
or scale_fill_gradient()
for continuous colour
df <- tibble(
x = rnorm(10000),
y = rnorm(10000)
ggplot(df, aes(x, y)) +
geom_hex() +
ggplot(df, aes(x, y)) +
geom_hex() +
viridis::scale_fill_viridis() +
All color scales come in two variety:
for colour
for fill
Set legend position: "left"
, "right"
, "top"
, "bottom"
, none
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
theme(legend.position = "left")
See following link for more details on how to change title, labels, … of a legend.
Without clipping (removes unseen data points)
ggplot(mpg, mapping = aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
coord_cartesian(xlim = c(5, 7), ylim = c(10, 30))
With clipping (removes unseen data points)
ggplot(mpg, mapping = aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
xlim(5, 7) + ylim(10, 30)
same as
mpg %>%
filter(displ >= 5, displ <= 7, hwy >= 10, hwy <= 30) %>%
ggplot(aes(displ, hwy)) +
geom_point(aes(color = class)) +
ggplot(mpg, mapping = aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
scale_x_continuous(limits = c(5, 7)) +
scale_y_continuous(limits = c(10, 30))
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
ggplot(mpg, aes(displ, hwy)) + geom_point()
## Saving 7 x 5 in image
RStudio cheat sheet is extremely helpful.