rm(list = ls()) # clean-up workspace

Announcement

R’s data structures

Homogeneous Heterogeneous
1d Atomic vector List
2d Matrix Data frame
nd Array

Vectors

Note: is.vector() does not test if an object is a vector. Use is.atomic() or is.list() to test.

Atomic vectors

  • There are four common types of atomic vectors (remember Lab 2?)

    • logical

    • integer

    • numeric (actually double)

    • character

Many commands in R generate a vector of output, rather than a single number.

The c() command: creates a vector containing a list of specific elements.

Example 1

c(7, 3, 6, 0)
## [1] 7 3 6 0
c(73:60)
##  [1] 73 72 71 70 69 68 67 66 65 64 63 62 61 60
c(7:3, 6:0)
##  [1] 7 6 5 4 3 6 5 4 3 2 1 0
c(rep(7:3, 6), 0)
##  [1] 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 0

Example 2 The command seq() creates a sequence of numbers.

seq(7)
## [1] 1 2 3 4 5 6 7
seq(3, 70, by = 6)
##  [1]  3  9 15 21 27 33 39 45 51 57 63 69
seq(3, 70, length = 6)
## [1]  3.0 16.4 29.8 43.2 56.6 70.0
  • Atomic vectors are always flat, even if you nest c()’s:

Example 3

c(1, c(2, c(3, 4)))
## [1] 1 2 3 4

Lists

  • Elements can be of any type, including lists.

  • Construct list by using list() instead of c().

x <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9))
str(x)
## List of 4
##  $ : int [1:3] 1 2 3
##  $ : chr "a"
##  $ : logi [1:3] TRUE FALSE TRUE
##  $ : num [1:2] 2.3 5.9
  • Can be named, can access by name with $.
x.named <- list(vector = 1:3, name = "a", logical = c(TRUE, FALSE, TRUE), range = c(2.3, 5.9))
str(x.named)
## List of 4
##  $ vector : int [1:3] 1 2 3
##  $ name   : chr "a"
##  $ logical: logi [1:3] TRUE FALSE TRUE
##  $ range  : num [1:2] 2.3 5.9
x.named$vector
## [1] 1 2 3
x.named$range
## [1] 2.3 5.9
  • Lists are used to build up many of the more complicated data structures in R.

  • For example, both data frames (another data structure in R) and linear models objects (as produced by lm()) are lists.

Attributes

  • All objects can have arbitrary additional attributes to store metadata about the object.

  • Attributes can be thought as a named list.

  • Use attr() to access individual attribute or attributes() to access all attributes as a list.

  • By default, most attributes are lost when modifying a vector. Only the most important ones stay:

    • Names, a character vector giving each element a name.

    • Dimensions, used to turn vectors into matrices and arrays.

    • Class, used to implement S3 object system.

y <- 1:10
attr(y, "my_attribute") <- "This is a vector"
attr(y, "my_attribute")
## [1] "This is a vector"
str(y)
##  int [1:10] 1 2 3 4 5 6 7 8 9 10
##  - attr(*, "my_attribute")= chr "This is a vector"
str(attributes(y))
## List of 1
##  $ my_attribute: chr "This is a vector"

Factors

  • A factor is a vector that can contain only predefined values and is used to store categorical data.

  • Built upon integer vectors using two attributes:

    • the class, “factor”: makes them behave differently from regular integer vectors

    • the levels: defines the set of allowed values

  • Sometimes when a data frame is read directly from a file, you may get a column of factor instead of numeric because of non-numeric value in the column (e.g. missing value encoded specially)

    • Possible remedy: coerce the vector from a factor to a character vecctor, and then from a character to a double vector

    • Better use na.strings argument to read.csv() function

Operations on vectors

Use brackets to select element of a vector.

x <- 73:60
x[2]
## [1] 72
x[2:5]
## [1] 72 71 70 69
x[-(2:5)]
##  [1] 73 68 67 66 65 64 63 62 61 60

Can access by “name” (safe with column/row order changes)

y <- 1:3
names(y) <- c("do", "re", "mi")
y[3]
## mi 
##  3
y["mi"]
## mi 
##  3

Matrices and arrays

  • adding a dim attribute to an atomic vector allows it to behave like a multi-dimensional array

  • matrix is a special case of array

  • matrix() command creates a matrix from the given set of values

# Two scalar arguments to specify rows and columns
a <- matrix(1:6, ncol = 3, nrow = 2)
# One vector argument to describe all dimensions
b <- array(1:12, c(2, 3, 2))

# You can also modify an object in place by setting dim()
c <- 1:6
dim(c) <- c(3, 2)
c
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
dim(c) <- c(2, 3)
c
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

R commands on vector/matrix

command usage
sum() sum over elements in vector/matrix
mean() compute average value
sort() sort all elements in a vector/matrix
min(), max() min and max values of a vector/matrix
length() length of a vector/matrix
summary() returns the min, Q1, median, mean, Q3, and max values of a vector
dim() dimension of a matrix
cbind() combine a sequence of vector, matrix or data-frame arguments and combine by columns
rbind() combine a sequence of vector, matrix or data-frame arguments and combine by rows
names() get or set names of an object
colnames() get or set column names of a matrix-like object
rownames() get or set row names of a matrix-like object

Exercise Write a command to generate a random permutation of the numbers between 1 and 5 and save it to an object.

Data frames

  • Most common way of storing data in R

  • A list of equal-length vectors

  • 2-dimensional structure, shares properties of both matrix and list

    • has attributes, names(), colnames() and rownames()

    • length() of a data frame is the length of the underlying list, same as ncol()

  • More in this week’s lab session