Units
The lectures are broken into 12 units, as shown below. These pages are also reachable from the calendar.
- Unit 1: Welcome back (Classes 1–3)
Course overview, Data science pipeline, Python review - Unit 2: Command line (Classes 4–7)
Files and directories, bash commands, Piping and redirection - Unit 3: Regular expressions (Classes 8–11)
Regex syntax, Python re, Command-line tools - Unit 4: Error handling (Classes 12–14)
try/except, return codes in bash - Unit 5: Versions and packaging (Classes 15–16)
git, pip, mamba - Unit 6: Data cleaning (Classes 17–19)
Missing data, Outliers, Preprocessing - Unit 7: Hardware and OS (Classes 20–22)
CPU, Memory hierarchy, Filesystems, Role of the operating system - Unit 8: Concurrency (Classes 23–26)
Multithreading, Python GIL, Multiprocessing, pickle, shell job control - Unit 9: Data Ethics (Classes 27–28)
Principles, Case studies - Unit 10: OOP in Python (Classes 29–32)
Operator overloading, Inheritance, Naming conventions, Generators - Unit 11: Typing (Classes 33–34)
Type hints, Linters, Static vs run-time checks - Unit 12: Machine learning with sklearn (Classes 35–38)
Statistical data types, Reading documentation, Classification, Regression