Category Defining Data Science

10 Important Data Science Ideas

Here’s a list of 10 important ideas we’ve explored this semester so far. 10. Interdisciplinary Data Science teams My experience at Google, along with DJ Patil’s piece on Building Data Science teams, informs my understanding of the importance of interdisciplinary teams. The students who showed up to take this class are from across departments and disciplines. […]

The Case for Data Science

Dear Students, Data Science is an emerging field in industry, yet not well-defined as an academic subject. This is the first course at Columbia that has the term “Data Science” in the title. So recently, Allen Bernard, a freelance journalist working on an article for CIO.com about the emerging role of the data scientist asked […]

Week 4: The Data Science Process, k-means, Classifiers, Logistic Regression and Evaluation

Each week Cathy O’Neil blogs about the class. Cross-posted from mathbabe.org This week our guest lecturer for the Columbia Data Science class was Brian Dalessandro. Brian works at Media6Degrees as a VP of Data Science, and he’s super active in the research community. He’s also served as co-chair of the KDD competition. Before Brian started, […]

The Data Science Process

Dear Students, Now that we’ve had our first guest lecture, I’d like to revisit the general framework I proposed for thinking about the data science process on the first day of class (when I generalized the example from Google Plus), and show how Jake’s lecture fits within this framework. Throughout the semester we’ll see that […]

Curse of dimensionality

This is a guest post by Professor Matthew Jones, from Columbia’s History department, who has been attending the course. I invited him to give his perspective on the course thus far. Few things lurk as much a challenge and instigation in data mining (or machine learning or the data sciences) as the “curse of dimensionality.” […]

Visualizing Bill Cleveland’s original Data Science Proposal

I described the origins of and short history of Data Science in week 1. The origins include a 2001 action plan, by William Cleveland, a statistician, written when he was at Bell Labs, to define propose Data Science as a new academic discipline. A student in our class, Eurry Kim (with permission), created the following: […]

Week 1 Report: Current View of the Scope of the Course

“data science”: collection of best practices taught to you by experts in the field eager to come teach you filling a gap we see in current education Data Science: research area Columbia University Institute for Data Sciences We’re at Columbia; We showed up for a Data Science class; We represent Columbia’s interdisciplinary research community. What […]

Big Data Domain Surfing (Part 1)

Dear Students, As mentioned we have  diverse backgrounds in this class. And lest there be any confusion, I am not talking about our ethnicities, home countries, or spoken languages. I’m talking about the academic spaces we each inhabit, which has me thinking along the lines of Data Science as having the potential to be the […]

Big Data in My Blood

Dear Students, Check out this story in this week’s NYT  Big Data in Your Blood  I want to use it to explore a couple things and ideas I was struggling with before the class started this semester, and that I wasn’t sure how to communicate with you about on our first day together: Semantics again […]

Data Scientist Profiles


An example of a data scientist profile of one of the students in our class

What were you thinking when you made us do those data scientist profiles?

I had four primary reasons for going through that exercise:
Reason 1: Cultivating self-awareness

Reason 2: Illustrate importance of standardization in visualization
I wanted to demonstrate standardizing visualizations of individuals as a mix of characteristics. (You should think about how you might do it, and then also ask yourself whether you think a standardized visualization has any value.) In this particular case

(a) standardizing the x-axis: I used the main buckets that I thought were approximately some of the skills one needs as a data scientist. I’m not tied to these

Follow

Get every new post delivered to your Inbox.

Join 360 other followers