rm(list = ls()) # clean-up workspace
library("tidyverse")
## ── Attaching packages ───────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.1 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
“The simple graph has brought more information to the data analyst’s mind than any other device.”
John Tukey
mpg
datampg
data is available from the ggplot2
package:
mpg %>% print(width = Inf)
## # A tibble: 234 x 11
## manufacturer model displ year cyl trans drv cty hwy fl
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p
## 3 audi a4 2 2008 4 manual(m6) f 20 31 p
## 4 audi a4 2 2008 4 auto(av) f 21 30 p
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p
## 7 audi a4 3.1 2008 6 auto(av) f 18 27 p
## 8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p
## 9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25 p
## 10 audi a4 quattro 2 2008 4 manual(m6) 4 20 28 p
## class
## <chr>
## 1 compact
## 2 compact
## 3 compact
## 4 compact
## 5 compact
## 6 compact
## 7 compact
## 8 compact
## 9 compact
## 10 compact
## # … with 224 more rows
Tibbles are a generalized form of data frames, which are extensively used in tidyverse.
displ
: engine displacement, in litres.
hwy
: highway fuel efficiency, in mile per gallen (mpg).
hwy
vs displ
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
An aesthetic maps data to a specifc feature of plot.
Check available aesthetics for a geometric object by ?geom_point
.
Color points according to class
:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
Assign different sizes to points according to class
:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = class))
Assign different transparency levels to points according to class
:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
## Warning: Using alpha for a discrete variable is not advised.
Assign different shapes to points according to class
:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = class))
Maximum of 6 shapes at a time. By default, additional groups will go unplotted.
Set the color of all points to be blue:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
Facets divide a plot into subplots based on the values of one or more discrete variables.
A subplot for each car type:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
A subplot for each car type and drive:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ class)
geom_smooth()
: smooth linehwy
vs displ
line:
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
Different line types according to drv
:
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))
Different line colors according to drv
:
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, color = drv))
Lines overlaid over scatter plot:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy))
Same as
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() + geom_smooth()
Different aesthetics in different layers:
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE)
diamonds
datadiamonds
data:
diamonds
## # A tibble: 53,940 x 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
## # … with 53,930 more rows
geom_bar()
creates bar chart:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
Bar charts, like histograms, frequency polygons, smoothers, and boxplots, plot some computed variables instead of raw data.
Check available computed variables for a geometric object via help:
?geom_bar
Use stat_count()
directly:
ggplot(data = diamonds) +
stat_count(mapping = aes(x = cut))
stat_count()
has a default geom geom_bar()
.
Display frequency instead of counts:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = stat(prop), group = 1))
Note the aesthetics mapping group=1
overwrites the default grouping (by cut
) by considering all observations as a group. Without this we get
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = stat(prop)))
geom_bar()
vs geom_col()
geom_bar()
makes the height of the bar proportional to the number of cases in each group (or if the weight aesthetic is supplied, the sum of the weights).
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
The height of bar is the number of diamonds in each cut category.
geom_col()
makes the heights of the bars to represent values in the data.
ggplot(data = diamonds) +
geom_col(mapping = aes(x = cut, y = carat))
The height of bar is total carat in each cut category.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, weight = carat))
Color bar:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, colour = cut))
Fill color:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut))
Fill color according to another variable:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity))
The stacking is performed automatically by the position adjustment specified by the position
argument.
If you don’t want a stacked bar chart, you can use one of three other options:
"identity"
"dodge"
"fill"
"stack"
(default)
position = "identity"
will place each object exactly where it falls in the context of the graph.
ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) +
geom_bar(alpha = 1/5, position = "identity")
ggplot(data = diamonds, mapping = aes(x = cut, colour = clarity)) +
geom_bar(fill = NA, position = "identity")
setting alpha
to a small value makes the bars slightly transparent
identity
position adjustment is more useful (default) for 2d geoms
position = "fill"
works like stacking, but makes each set of stacked bars the same height.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")
position = "dodge"
places overlapping objects directly beside one another.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")
position_gitter()
add random noise to X and Y position of each element to avoid overplotting:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")
geom_jitter()
is similar:
ggplot(data = mpg) +
geom_jitter(mapping = aes(x = displ, y = hwy))
Recall the mpg data:
mpg
## # A tibble: 234 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manual… f 21 29 p comp…
## 3 audi a4 2 2008 4 manual… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto(a… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto(l… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manual… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto(a… f 18 27 p comp…
## 8 audi a4 quat… 1.8 1999 4 manual… 4 18 26 p comp…
## 9 audi a4 quat… 1.8 1999 4 auto(l… 4 16 25 p comp…
## 10 audi a4 quat… 2 2008 4 manual… 4 20 28 p comp…
## # … with 224 more rows
Boxplots (grouped by class):
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot()
coord_cartesian()
is the default cartesian coordinate system:
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_cartesian(xlim = c(0, 5))
coord_fixed()
specifies aspect ratio (x / y):
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_fixed(ratio = 1/2)
coord_flip()
flips x- and y- axis:
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_flip()
Pie chart:
bar <- ggplot(data = diamonds) +
geom_bar(
mapping = aes(x = cut, fill = cut),
show.legend = FALSE,
width = 1
) +
theme(aspect.ratio = 1) +
labs(x = NULL, y = NULL)
bar + coord_flip()
bar + coord_polar()
A map:
library("maps")
nz <- map_data("nz")
head(nz, 20)
## long lat group order region subregion
## 1 172.7433 -34.44215 1 1 North.Island <NA>
## 2 172.7983 -34.45562 1 2 North.Island <NA>
## 3 172.8528 -34.44846 1 3 North.Island <NA>
## 4 172.8986 -34.41786 1 4 North.Island <NA>
## 5 172.9593 -34.42503 1 5 North.Island <NA>
## 6 173.0184 -34.39895 1 6 North.Island <NA>
## 7 173.0229 -34.44662 1 7 North.Island <NA>
## 8 173.0184 -34.49343 1 8 North.Island <NA>
## 9 172.9616 -34.50426 1 9 North.Island <NA>
## 10 172.9181 -34.47367 1 10 North.Island <NA>
## 11 172.9353 -34.52225 1 11 North.Island <NA>
## 12 172.8808 -34.51504 1 12 North.Island <NA>
## 13 172.9049 -34.55646 1 13 North.Island <NA>
## 14 172.9553 -34.53303 1 14 North.Island <NA>
## 15 172.9376 -34.57806 1 15 North.Island <NA>
## 16 172.9760 -34.61227 1 16 North.Island <NA>
## 17 172.9926 -34.56723 1 17 North.Island <NA>
## 18 173.0218 -34.61404 1 18 North.Island <NA>
## 19 173.0396 -34.65902 1 19 North.Island <NA>
## 20 173.0676 -34.70044 1 20 North.Island <NA>
ggplot(nz, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "white", colour = "black")
coord_quickmap()
puts maps in scale:
ggplot(nz, aes(long, lat, group = group)) +
geom_polygon(fill = "white", colour = "black") +
coord_quickmap()