rm(list = ls()) # clean-up workspace
library("tidyverse")
## ── Attaching packages ───────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.1     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Acknowledgement

Dr. Hua Zhou’s slides

A typical data science project:

Data visualization

“The simple graph has brought more information to the data analyst’s mind than any other device.”

John Tukey

mpg data

Aesthetic mappings | r4ds chapter 3.3

Scatter plot

  • hwy vs displ

    ggplot(data = mpg) + 
      geom_point(mapping = aes(x = displ, y = hwy))

  • An aesthetic maps data to a specifc feature of plot.

  • Check available aesthetics for a geometric object by ?geom_point.

Color of points

  • Color points according to class:

    ggplot(data = mpg) + 
      geom_point(mapping = aes(x = displ, y = hwy, color = class))

Size of points

  • Assign different sizes to points according to class:

    ggplot(data = mpg) + 
      geom_point(mapping = aes(x = displ, y = hwy, size = class))

Transparency of points

  • Assign different transparency levels to points according to class:

    ggplot(data = mpg) + 
      geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
    ## Warning: Using alpha for a discrete variable is not advised.

Shape of points

  • Assign different shapes to points according to class:

    ggplot(data = mpg) + 
      geom_point(mapping = aes(x = displ, y = hwy, shape = class))

  • Maximum of 6 shapes at a time. By default, additional groups will go unplotted.

Manual setting of an aesthetic

  • Set the color of all points to be blue:

    ggplot(data = mpg) + 
      geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

Facets | r4ds chapter 3.5

Facets

  • Facets divide a plot into subplots based on the values of one or more discrete variables.

  • A subplot for each car type:

    ggplot(data = mpg) + 
      geom_point(mapping = aes(x = displ, y = hwy)) + 
      facet_wrap(~ class, nrow = 2)


  • A subplot for each car type and drive:

    ggplot(data = mpg) + 
      geom_point(mapping = aes(x = displ, y = hwy)) + 
      facet_grid(drv ~ class)

Geometric objects | r4ds chapter 3.6

geom_smooth(): smooth line

  • hwy vs displ line:

    ggplot(data = mpg) + 
      geom_smooth(mapping = aes(x = displ, y = hwy))

Different line types

  • Different line types according to drv:

    ggplot(data = mpg) + 
      geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))

Different line colors

  • Different line colors according to drv:

    ggplot(data = mpg) + 
      geom_smooth(mapping = aes(x = displ, y = hwy, color = drv))

Points and lines

  • Lines overlaid over scatter plot:

    ggplot(data = mpg) + 
      geom_point(mapping = aes(x = displ, y = hwy)) + 
      geom_smooth(mapping = aes(x = displ, y = hwy))


  • Same as

    ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
      geom_point() + geom_smooth()

Aesthetics for each geometric object

  • Different aesthetics in different layers:

    ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
      geom_point(mapping = aes(color = class)) + 
      geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE)

Bar plots | r4ds chapter 3.7

diamonds data

  • diamonds data:

    diamonds
    ## # A tibble: 53,940 x 10
    ##    carat cut       color clarity depth table price     x     y     z
    ##    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
    ##  1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
    ##  2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
    ##  3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
    ##  4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
    ##  5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
    ##  6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
    ##  7 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
    ##  8 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
    ##  9 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
    ## 10 0.23  Very Good H     VS1      59.4    61   338  4     4.05  2.39
    ## # … with 53,930 more rows

Bar plot

  • geom_bar() creates bar chart:

    ggplot(data = diamonds) + 
      geom_bar(mapping = aes(x = cut))


  • Bar charts, like histograms, frequency polygons, smoothers, and boxplots, plot some computed variables instead of raw data.

  • Check available computed variables for a geometric object via help:

    ?geom_bar

  • Use stat_count() directly:

    ggplot(data = diamonds) + 
      stat_count(mapping = aes(x = cut))

  • stat_count() has a default geom geom_bar().


  • Display frequency instead of counts:

    ggplot(data = diamonds) + 
      geom_bar(mapping = aes(x = cut, y = stat(prop), group = 1))    

    Note the aesthetics mapping group=1 overwrites the default grouping (by cut) by considering all observations as a group. Without this we get

    ggplot(data = diamonds) + 
      geom_bar(mapping = aes(x = cut, y = stat(prop)))    

geom_bar() vs geom_col()

  • geom_bar() makes the height of the bar proportional to the number of cases in each group (or if the weight aesthetic is supplied, the sum of the weights).

    ggplot(data = diamonds) + 
      geom_bar(mapping = aes(x = cut))

    The height of bar is the number of diamonds in each cut category.

  • geom_col() makes the heights of the bars to represent values in the data.

    ggplot(data = diamonds) + 
      geom_col(mapping = aes(x = cut, y = carat))

    The height of bar is total carat in each cut category.

    ggplot(data = diamonds) + 
      geom_bar(mapping = aes(x = cut, weight = carat))

Positional adjustments | r4ds chapter 3.8




  1. The stacking is performed automatically by the position adjustment specified by the position argument.

  2. If you don’t want a stacked bar chart, you can use one of three other options:

    • "identity"

    • "dodge"

    • "fill"

    • "stack" (default)






Coordinate systems | r4ds chapter 3.9







ggplot(nz, aes(x = long, y = lat, group = group)) +
  geom_polygon(fill = "white", colour = "black")