Introduction to Data Science is being offered a second time in the Fall 2013. One major change this year is that I will be co-teaching with Dr. Kayur Patel, a computer scientist at Google. Here’s the Syllabus:
Introduction to Data Science Syllabus, Fall 2013
Statistics W4242, Columbia University
Staff
Professors: Dr. Kayur Patel (kp2566@columbia.edu) and Dr. Rachel Schutt (rrs2117@columbia.edu)
Lab Instructor: Jared Lander (jpl2135@columbia.edu)
Teaching Assistant: Haolei Weng (hw2375@columbia.edu)
Project Coordinator: Anna Hurley (anna.c.hurley@gmail.com)
Location and Time
Lectures: Mondays and Wednesdays, 6:10-7:25pm @ 428 Pupin Laboratories
Labs: Tuesdays, 6:10-7:25pm @ 312 Math OR 7:40-8:55pm @ 417 Math
TA Office Hours: Tuesdays and Fridays, 2:00-4:00pm @ Lounge 10th floor, SSW (School of Social Work) and by appointment
Course Description
This course serves as an introduction to the interdisciplinary and emerging field of data science. Students will learn to combine tools and techniques from statistics, computer science, data visualization and the social sciences to solve problems using data. Central threads include: (1) the data science process from data collection to product, (2) tools for working with both big and small datasets, (3) statistical modeling and machine learning, and (4) real world topics and case studies. The course consists of: (1) core lectures by the instructors, (2) guest lectures from data scientists who are experts in their fields, and (3) a course-long project. Topics and tools will include data wrangling and munging, machine learning algorithms, statistical models, data visualization, data journalism, R, ethics, MapReduce, and data pipelines.
Goals of the course
1) Learn about what it’s like to be a data scientist
2) Be able to do some of what a data scientist does
Schedule and course structure
The course is organized into two sections. The first section is devoted to the data science process. Lectures during this period will correspond to the various stages of the process to build student skill sets and understanding. The second section is special topics and case studies in data science and will include guest lectures that demonstrate the data science process in context, as well as deeper dives into different classes of data including text, images and graphs.
9/4/2024 |
Canceled [Rosh Hashanah] |
9/9/2024 |
Introduction, Syllabus, Data Science Process |
9/11/2024 |
Data Science Process, Intro to Algorithms |
9/16/2013 |
Scoping Projects, Asking good questions [Drew Conway, Datakind] |
9/18/2013 |
Data: Unstructured vs. Structured Data, Databases |
9/23/2013 |
Sampling and exploratory data analysis |
9/25/2013 |
Statistical modeling and inference |
9/30/2013 |
HCI and Data Science |
10/2/2025 |
Feature Selection, Kaggle Competition [Will Cukierski, Kaggle] |
10/7/2024 |
Machine Learning Overview: Classification, Regression, Clustering |
10/9/2024 |
Machine Learning: Specific algorithms |
10/14/2013 |
Visualization: Charts, Graphs, Precognitive Features |
10/16/2013 |
Visualization: Interactive visualizations, Infographics |
10/21/2013 |
Data & Journalism |
10/23/2013 |
Data & Journalism [Steve Lohr & Andy Lehren, The New York Times] |
10/28/2013 |
Working at Scale: memory, parallelization, mapreduce [Aaron Kimball] |
10/30/2013 |
Midterm Project Presentations (Ignite Talks) |
11/4/2024 |
Academic Holiday |
11/6/2013- 12/2/2025 |
Special topics and case studies may include natural language processing, machine translation, crowd-sourcing, mechanical turk, social network data. Guest lecturers most likely from Facebook, Google, Foursquare, Microsoft Research |
12/4/2013, 12/9/2024 |
Project Presentations |