Author Archives: Rachel Schutt

Doing Data Science & Ada Lovelace Day

My book (with Cathy O’Neil), Doing Data Science, is now available on ebook and the print version will be available next week! The book is based on last year’s Introduction to Data Science class. In honor of Ada Lovelace Day, O’Reilly (our publisher) is offering 50% off books by women, and so because we’re women, […]

Announcing the Columbia Data Science Society

There is a new student group on campus called the Columbia Data Science Society. They’ve asked me to pass along the following information: Introducing Columbia Data Science Society! Columbia Data Science Society, CDSS, is an interdisciplinary society that promotes data science across Columbia University and the New York City community. Our goal is to understand […]

Introduction to Data Science Version 2.0

I’m teaching Introduction to Data Science for the second year. We just started last week,  and here are some of the significant differences between this year and last year: (1) Added another professor: I am team teaching this year with Dr. Kayur Patel who is a computer scientist at Google. Crudely speaking we can think […]

Philosophy of Data Science: Embrace the Practical and the Profound

This is my last blog post for Statistics 4242, Introduction to Data Science at Columbia University. All final projects have been turned in; grades have been given; the semester is over. I reserve the right to start blogging again at a later date. Dear Students, From the beginning, this course viewed Data Science simultaneously in […]

Kaggle Visualization Competition in Our Honor!

Dear Students, There is a new Kaggle Visualization Competition in our honor! I encourage you all to enter it! I received this email from Will Cukierski from Kaggle. This email was sent to me and Chris Mulligan. (See the p.s. for the Legend of Chris Mulligan.) Yours, Rachel Chris and Rachel, Thanks to your blog […]

Kaggle Competition Final Results!

Congratulations to Maura Fitzgerald for taking first place in our in-class Kaggle competition! First a couple comments, and then the final results are below. Were these Kaggle-competitive scores? The top scores were in the ballpark of the winning scores in the external version of this competition. The students in the class were given slightly different […]

My Strata Talk: Next-Gen Data Scientists

Dear Students, I’ll be giving a talk at Strata in February about this course and our experiences together: http://strataconf.com/strata2013/public/schedule/detail/27529 I’m bringing it up now, even though it’s more than two months off, because I plan to stop blogging about the class when the semester is finished. Here’s the abstract: Data Science is an emerging field […]

Class of 2013 hackNY Fellows

The following is from Chris Wiggins, a professor in the department of Applied Mathematics and Applied Physics at Columbia.  Chris’s name has come up multiple times throughout the semester including the very first day: What is Data Science? and the last day during the student presentations. Dear Rachel: I’m emailing to ask your help getting […]

Week 14: Student Presentations, Synthesis of Semester

Each week Cathy O’Neil blogs about the class. Cross-posted from mathbabe.org. Thank you Cathy for doing such a wonderful job this semester capturing the course in this way, and also for being a respected voice in the classroom, a question-asker and role model for the students. Here’s our class photo, and Cathy’s blog post follows. Cathy’s post captures the presentation done by a subset of students, which represented a collaboration of many/most students in this course, as part of their work for a think piece. More on this to come at a later date. It also captures my synthesis of the semester.

class_photo
In the final week of Rachel Schutt’s Columbia Data Science course, we heard from two groups of students as well as from Rachel herself. [...]

The Stars of Data Science

VizStars
This is another part of the students’ final project. A small group designed a survey to assess their classmates on different dimensions that capture the skills of a data scientist, and administered the survey to their classmates. The questions were of the form “Do you know what ___ means?”, or “Have you ever implemented ____?”. The students were well aware of potential biases in their questions, the limitations of self-reporting, etc. The survey was a great first pass.

This is an innovative way of describing and visualizing Data Scientists — it captures the variablity among data scientists, and allows for the potential for effective Data Science teams to be constructed by creating “constellations” of these stars, or overlaying the stars on top of each other to create “complete” data science teams. The visualization and survey represented an improvement over the data science profiles I gave them at the beginning of the semester. This was a collaborative effort among many students including Adam Obeng, Eurry Kim, Christina Gutierrez, Kaz Sakamoto, and Vaibhav Bhandari. Full report of last lecture still to come.

Follow

Get every new post delivered to your Inbox.

Join 360 other followers