Dear Students,
Lest you think (yes, I know I used that turn of phrase in posts before. I like it.) that I am bragging about my character traits (in which case you don’t know me well enough yet- I never brag, and who isn’t human?), wipe that thought from your mind, and read on.
On Inspiring Students:
Of course one hopes as a teacher that one will inspire students, and yes, I am seeing some evidence that this is, in fact, happening. I think some of you are getting inspired! But what I actually mean by “inspiring students” is that you are inspiring me; you are students who inspire: “inspiring students”. This is one of the happy unintended consequences of this course so far for me. I’ve been having conversations with students via email and quick snatches of conversation before and after class. [See caveats at end of post.] I’m hoping as these students develop their ideas further, they’ll have stuff to share with us in class or on the blog. Let me give some examples of what I see happening:
Phillip is a PhD student in the sociology department (and invited us to the lecture series on Computational Social Science). He’s in the process of developing his thesis topic around some of the themes we’ve been discussing in this class, such as the emerging data science community. I’m excited about what he comes up with!
Arvi works at the College Board and is a part time student in the quantitative methods in the social sciences (QMSS) program. He analyzes user-level data of students who have signed up for (and taken) the SATs and has lots of interesting data around where those students hope to go to college; and longitudinal data sets that allow him and his colleagues to examine trends over time related to higher education and these influential standardized tests. It’s nice to have these examples in mind as we learn new methods in class.
Adam, Christina and Eurry (respectively, (1)sociology PhD student, (2)data scientist at Nielsen, and (3)”aspring data visualizer” (better term?) from the QMSS program) have taken on the challenge of polling the students and then developing an algorithm to automatically find optimal data science teams and a corresponding visualization. Looking forward to what they come up with!…
Matt is a history of science professor who wrote the Curse of Dimensionality post a week ago, and is starting to think about (or revisit) how exploratory text classification could be used in his research.
Jed works as a data analyst at Case Commons, a nonprofit that builds web apps and and databases for state-wide foster care agencies. He’s a student in QMSS. After Jake’s class, he read this paper on using Naive Bayes to classify suicide notes, and now has some early ideas of ways he might apply this approach in his own work.
Maryanne is the Executive Director for the Center for Innovation Through Data Intelligence in Mayor Bloomberg’s office. Her office deals with data about the juvenile justice system, homelessness and poverty and she too is thinking about how analyzing data sets could be used to prioritize social worker interventions.
Conversations with Jed and Maryanne reminded me of a New Yorker article I read more than a year ago about a doctor who was able to prioritize patient interventions by examining data, ultimately saving tons of money as well as saving lives. Jed and Maryanne are Hot Spotters! There’s a lot to think about around this topic.
Then let’s not forget the Biomedical Informatics (or variation of that) students/post-doc, Hojjat, Albert and Heather; or Kaushik, the student from operations research interested in journalism; or Yegor, the business school student who has an interest in urban planning and architecture, who sent along this link (all of whom I want to talk to more about their interests).
The comments on the blog from various students are also starting to become interesting. Also let me add Jared’s (our lab instructor) study of his own text messages, after he broke up with his girlfriend, which he just told me about tonight.
On Being Human:
We’re four weeks into this class, and Being Human is emerging as a theme for me in two key ways:
(1) Being a good data scientist is not just about applying Machine Learning, it’s about Being Human. Evidence of this permeates my posts(the profiles,Human Ingenuity), and is what allows me to interpret meaning in user-level data. But for another twist on this, let me quote Cathy paraphrasing Brian Dalessandro from Wednesday’s class:
One under-appreciated constraint of a data scientist is this: your own understanding of the algorithm. Ask yourself carefully, do you understand it for real?Really? Admit it if you don’t. You don’t have to be a master of every algorithm to be a good data scientist.
(2) Data Scientists often are exploring data sets that help us understand human behavior. Examples include Google+, Facebook, Media 6 Degrees(marketing), Linked In,… We’ll explore more throughout the semester. Data Scientists I know tend to have an interest in Being Human. Or Human Beings. Chris (in another example of inspiration) sent me an email with the subject “The Human Face of Big Data”:
Here’s an interesting project/publicity stunt about using smartphones (and surveys) to collect personal data.
Article: http://arstechnica.com/business/2012/09/crowdsourcing-app-will-measure-the-world-for-a-week-through-smartphones/
Website: http://thehumanfaceofbigdata.com
Yours,
Rachel
p.s. While I’m on the subject of Brian, he sent me this video that he and his data science teammate, Ori, (who will be coming to our class in a few weeks) made. Don’t let this distract you from my Being Human Thesis, or if anything, incorporate it into the Being Human Thesis:
Caveats
(On the topic of calling out specific Inspiring Students, which, if I were a student in this class, I might find annoying especially if I hadn’t yet done anything to “inspire” the professor, please allow this caveat: you are all at different stages of advancement in your career and education. For some of you, you might be currently in a state of confusion (confusion is good- that’s the state you sometimes pass through to learn); some of you might be in a state of irritation, stress or anxiety (also normal); some of you might not yet have ideas you think are worth talking about: you’re not yet ready to speak; some of you may be very busy at work and just getting the homework assignments done is a lot. I understand all this. I’ve been there myself. Please come to the happy hour and we can talk more.)
(Also I am sensitive to the fact that the students who have shared some of their ideas with me are at the early stages of developing these ideas, and I have asked their permission to write about them here; and will take down anything from the blog if they end up not wanting it up.)
[…] buried this in a p.s. at the end of my previous post, and I think it’s sufficiently funny that I need to bring it to the top so you don’t […]
thanks for the blog and the write ups!
its really cool to know about of what some of us in the class are doing,thinking,inspiring!
Kudos to the hot spotters!
[…] spam classification problem is “data science equivalent” (DSE, I made that up) to the suicide note classification problem which is DSE to the ad-click prediction problem at M6D. The structure of the data is the same, the […]
[…] Schutt (the author of the Taxonomy of Confusion) has a blog! for the course she’s teaching at Columbia, “Introduction to Data Science.” It […]
Hi Rachel Schutt,
I went to high school with you long ago and far away (PHS ’96), and now I’m dong a PhD in quant education research (at NYU), and I found your blog on Andrew Gelman’s (as I was trying to figure out multiple imputation which is currently kicking my ass) . Good stuff. I taught middle school math and did a lot of stats with the kids (more than the state test deemed appropriate) which was lots of fun. I like your Machine Learning v Being Human dichotomy, and it fits with my experience of how to genuinely engage learners as well as how to do good quantitative research. Keep it up! And please say hi to Becky for me.
Best,
Rachel
Hi Rachel Cole,
Great to hear from you! I’m glad it resonates with you. Quant education research needs someone like you, for sure. I’ll say hi to Becky; she’s in London.
(Update: I just said hi to Becky and she says “Rachel Cole! How exciting! Please send her my love from afar. High school was half our lifetimes ago.”)
Rachel Schutt