rm(list = ls()) # clean-up workspace
Lab session
HW1 posted, due Sep 11
1-page report of course project due this week
UConn Sports Analytics Symposium, Saturday October 10, $5 registration
Homogeneous | Heterogeneous | |
---|---|---|
1d | Atomic vector | List |
2d | Matrix | Data frame |
nd | Array |
Homogeneous: all contents must be of the same type
Heterogeneous: the contents can be of different types
The basic data structure in R.
Two flavors: atomic vectors and lists
Three common properties:
Type, typeof()
, what it is.
Length, length()
, how many elements it contains.
Attributes, attributes()
, additional arbitrary metadata.
No scalars in R. They are length 1 vectors.
Note: is.vector()
does not test if an object is a vector. Use is.atomic()
or is.list()
to test.
There are four common types of atomic vectors (remember Lab 2?)
logical
integer
numeric (actually double)
character
Many commands in R generate a vector of output, rather than a single number.
The c()
command: creates a vector containing a list of specific elements.
Example 1
c(7, 3, 6, 0)
## [1] 7 3 6 0
c(73:60)
## [1] 73 72 71 70 69 68 67 66 65 64 63 62 61 60
c(7:3, 6:0)
## [1] 7 6 5 4 3 6 5 4 3 2 1 0
c(rep(7:3, 6), 0)
## [1] 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 7 6 5 4 3 0
Example 2 The command seq()
creates a sequence of numbers.
seq(7)
## [1] 1 2 3 4 5 6 7
seq(3, 70, by = 6)
## [1] 3 9 15 21 27 33 39 45 51 57 63 69
seq(3, 70, length = 6)
## [1] 3.0 16.4 29.8 43.2 56.6 70.0
c()
’s:Example 3
c(1, c(2, c(3, 4)))
## [1] 1 2 3 4
Elements can be of any type, including lists.
Construct list by using list()
instead of c()
.
x <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9))
str(x)
## List of 4
## $ : int [1:3] 1 2 3
## $ : chr "a"
## $ : logi [1:3] TRUE FALSE TRUE
## $ : num [1:2] 2.3 5.9
$
.x.named <- list(vector = 1:3, name = "a", logical = c(TRUE, FALSE, TRUE), range = c(2.3, 5.9))
str(x.named)
## List of 4
## $ vector : int [1:3] 1 2 3
## $ name : chr "a"
## $ logical: logi [1:3] TRUE FALSE TRUE
## $ range : num [1:2] 2.3 5.9
x.named$vector
## [1] 1 2 3
x.named$range
## [1] 2.3 5.9
Lists are used to build up many of the more complicated data structures in R.
For example, both data frames (another data structure in R) and linear models objects (as produced by lm()
) are lists.
All objects can have arbitrary additional attributes to store metadata about the object.
Attributes can be thought as a named list.
Use attr()
to access individual attribute or attributes()
to access all attributes as a list.
By default, most attributes are lost when modifying a vector. Only the most important ones stay:
Names, a character vector giving each element a name.
Dimensions, used to turn vectors into matrices and arrays.
Class, used to implement S3 object system.
y <- 1:10
attr(y, "my_attribute") <- "This is a vector"
attr(y, "my_attribute")
## [1] "This is a vector"
str(y)
## int [1:10] 1 2 3 4 5 6 7 8 9 10
## - attr(*, "my_attribute")= chr "This is a vector"
str(attributes(y))
## List of 1
## $ my_attribute: chr "This is a vector"
A factor is a vector that can contain only predefined values and is used to store categorical data.
Built upon integer vectors using two attributes:
the class
, “factor”: makes them behave differently from regular integer vectors
the levels
: defines the set of allowed values
Sometimes when a data frame is read directly from a file, you may get a column of factor instead of numeric because of non-numeric value in the column (e.g. missing value encoded specially)
Possible remedy: coerce the vector from a factor to a character vecctor, and then from a character to a double vector
Better use na.strings
argument to read.csv()
function
Use brackets to select element of a vector.
x <- 73:60
x[2]
## [1] 72
x[2:5]
## [1] 72 71 70 69
x[-(2:5)]
## [1] 73 68 67 66 65 64 63 62 61 60
Can access by “name” (safe with column/row order changes)
y <- 1:3
names(y) <- c("do", "re", "mi")
y[3]
## mi
## 3
y["mi"]
## mi
## 3
adding a dim
attribute to an atomic vector allows it to behave like a multi-dimensional array
matrix is a special case of array
matrix()
command creates a matrix from the given set of values
# Two scalar arguments to specify rows and columns
a <- matrix(1:6, ncol = 3, nrow = 2)
# One vector argument to describe all dimensions
b <- array(1:12, c(2, 3, 2))
# You can also modify an object in place by setting dim()
c <- 1:6
dim(c) <- c(3, 2)
c
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
dim(c) <- c(2, 3)
c
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
command | usage |
---|---|
sum() | sum over elements in vector/matrix |
mean() | compute average value |
sort() | sort all elements in a vector/matrix |
min(), max() | min and max values of a vector/matrix |
length() | length of a vector/matrix |
summary() | returns the min, Q1, median, mean, Q3, and max values of a vector |
dim() | dimension of a matrix |
cbind() | combine a sequence of vector, matrix or data-frame arguments and combine by columns |
rbind() | combine a sequence of vector, matrix or data-frame arguments and combine by rows |
names() | get or set names of an object |
colnames() | get or set column names of a matrix-like object |
rownames() | get or set row names of a matrix-like object |
Exercise Write a command to generate a random permutation of the numbers between 1 and 5 and save it to an object.
Most common way of storing data in R
A list of equal-length vectors
2-dimensional structure, shares properties of both matrix
and list
has attributes, names()
, colnames()
and rownames()
length()
of a data frame is the length of the underlying list, same as ncol()
More in this week’s lab session