Introduction

Course logistics

Giant’s shoulders

Statistics and data science

A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.

Big data in 1990s

@Huber94HugeData; @Huber96MassiveData

Data Size Bytes Storage Mode
tiny \(10^2\) piece of paper
small \(10^4\) a few pieces of paper
medium \(10^6\) (MB) a floppy disk
large \(10^8\) hard disk
huge \(10^9\) (GB) hard disk(s)
massive \(10^{12}\) (TB) hard disk(s); RAID storage

Big data in 21st centry

Four V’s of big data:

Source: IBM.

A typical data scientist on Linkedin

A random online cartoon for data scientist

Course description