Each lecture we give the students a Thought Experiment. Students discuss it privately first with the people around them and then we have a class discussion. I always wish these could go on longer because the students have such interesting ideas. The first couple weeks I posed the thought experiments, and then after that, the guest lecturers have. It’s a nice way to provide continuity between guest lecturers, and get the students started thinking about the themes of that evening’s lecture:
Week 1: Can we use data science to define data science? [Posed by Rachel Schutt]
Week 2: How would you simulate chaos? [Posed by Rachel Schutt]
Week 3: First, students were given a bunch of text, and asked what it was. They quickly were able to identify it as spam. How did you figure this out? Can you write code to automate the spam filter that your brain is? [Posed by Jake Hofman]
Week 4: How would data science differ if we had a “grand unified theory of everything”? [Posed by Brian Dalessandro]
Week 5: What do you lose when you think of your training set as a big pile of data and ignore the timestamps? [Posed by Cathy O’Neil]
Week 6: What are the ethical implications of a robo-grader? [Posed by Will Cukierski]
Week 6: (1) How might we design technology to support managing one’s social graph? (2)Privacy is important to users and in designing technology. What is the best way to decrease concern, and increase understanding and control? [Posed by David Huffaker]
Week 7: [Filter bubbles] What are the implications of using error minimization to predict preferences? How does presentation of recommendations affect the feedback collected? For example, can we end up in local maxima with rich-get-richer effects? In other words, does showing certain items at the beginning “give them an unfair advantage” over other things? And so do certain things just get popular or not based on luck?How do we correct for this? [Posed by Matt Gattis]
Week 8: As data become more personal, as we collect more data about “individuals”, what new methods or tools do we need to express the fundamental relationship between ourselves and our communities, our communities and our country, our country and the world? Could we ever be satisfied with poll results or presidential approval ratings when we can see the complete trajectory of public opinions, individuated and interacting? [Posed by Mark Hansen]
Week 8: Suppose you know about every single transaction in the world as it occurs. How would you use that data? [Posed by Ian Wong]
Week 9: You’re part of an elite, well-funded think tank in DC. You can hire people and you have $10million to spend. Your job is to empirically predict the future political evolution of Egypt. What kinds of political parties will there be? What is the country of Egypt gonna look like in 5, 10, or 20 years? You have access to exactly two of the following datasets for all Egyptians:
- The Facebook network,
- The Twitter network,
- A complete record of who went to school with who,
- The SMS/phone records,
- The network data on members of all political organizations and private companies, and
- Where everyone lives and who they talk to.
Which do you pick? [Posed by John Kelly]
Week 10: We now have detailed, longitudinal medical data on tens of millions of patients. What can we do with it? [Posed by David Madigan]
Week 11: How do you know if your data may be used to answer your question of interest? Sometimes people think that because they have data on a subject matter then you can answer any question. [Posed by Ori Stitelman]
Week 12: You got fMRI data from a serious and very reliable source (Siemens Medical). You got for every patient multiple examples (so called regions) and for each region you have 117 numeric features that were somehow derived from the pixels of the fRMI image. Every patient has a numeric ID. You are doing a deep EDA and realize that the single most predictive feature to identify breast cancer is the patient number. Is this a problem? Why? What would you do? [Posed by Claudia Perlich]
Week 13: What is the appropriate amount of privacy in health? Who should have access to your medical records? [Posed by David Crawshaw]
Week 13: How would you build a human-powered airplane? What would you do? How would you form a team? [Posed by Josh Wills]
Week 14: If all data which had ever been collected were freely available to everyone, would we be better off? [Posed by The Class]
Week 14: Let’s start with a quote:
“Anything which uses science as part of its name isn’t political science, creation science, computer science.”
- Hal Abelson, MIT CS prof
Keeping this in mind, if you could re-label data science, would you? What would you call it? [Posed by The Class]
Week 14: How would you design a data science class around habits of mind rather than technical skills? How would you quantify it? How would you evaluate? What would students be able to write on their resumes? [Posed by Rachel Schutt]
Week 14: Come up with a business that improves the world and makes money and uses data [Posed by Cathy O’Neil]
Week 14: Design an app to combat the Dunning-Kruger effect. [Posed by Rachel Schutt]