Category Archives: Defining Data Science

The Case for Data Science

Dear Students, Data Science is an emerging field in industry, yet not well-defined as an academic subject. This is the first course at Columbia that has the term “Data Science” in the title. So recently, Allen Bernard, a freelance journalist working on an article for CIO.com about the emerging role of the data scientist asked [...]

Week 4: The Data Science Process, k-means, Classifiers, Logistic Regression and Evaluation

Each week Cathy O’Neil blogs about the class. Cross-posted from mathbabe.org This week our guest lecturer for the Columbia Data Science class was Brian Dalessandro. Brian works at Media6Degrees as a VP of Data Science, and he’s super active in the research community. He’s also served as co-chair of the KDD competition. Before Brian started, [...]

The Data Science Process

Dear Students, Now that we’ve had our first guest lecture, I’d like to revisit the general framework I proposed for thinking about the data science process on the first day of class (when I generalized the example from Google Plus), and show how Jake’s lecture fits within this framework. Throughout the semester we’ll see that [...]

Curse of dimensionality

This is a guest post by Professor Matthew Jones, from Columbia’s History department, who has been attending the course. I invited him to give his perspective on the course thus far. Few things lurk as much a challenge and instigation in data mining (or machine learning or the data sciences) as the “curse of dimensionality.” [...]

Visualizing Bill Cleveland’s original Data Science Proposal

I described the origins of and short history of Data Science in week 1. The origins include a 2001 action plan, by William Cleveland, a statistician, written when he was at Bell Labs, to define propose Data Science as a new academic discipline. A student in our class, Eurry Kim (with permission), created the following: [...]

Week 1 Report: Current View of the Scope of the Course

“data science”: collection of best practices taught to you by experts in the field eager to come teach you filling a gap we see in current education Data Science: research area Columbia University Institute for Data Sciences We’re at Columbia; We showed up for a Data Science class; We represent Columbia’s interdisciplinary research community. What [...]

Big Data Domain Surfing (Part 1)

Dear Students, As mentioned we have diverse backgrounds in this class. And lest there be any confusion, I am not talking about our ethnicities, home countries, or spoken languages. I’m talking about the academic spaces we each inhabit, which has me thinking along the lines of Data Science as having the potential to be the [...]

Big Data in My Blood

Dear Students, Check out this story in this week’s NYT Big Data in Your Blood I want to use it to explore a couple things and ideas I was struggling with before the class started this semester, and that I wasn’t sure how to communicate with you about on our first day together: Semantics again [...]

Week 1: What is Data Science?

Cathy O’Neil will be blogging about the class. Crossposted from mathbabe.org I’m attending Rachel Schutt’s Columbia University Data Science course on Wednesdays this semester and I’m planning to blog the class. Here’s what happened yesterday at the first meeting. Syllabus Rachel started by going through the syllabus. Here were her main points: The prerequisites for [...]

Follow

Get every new post delivered to your Inbox.

Join 53 other followers

Powered by WordPress.com