The following is a prologue to a discussion of what makes for a good data scientist.
Data is information and is extremely powerful. Models and algorithms that use data can literally change the world. Quantitatively-minded people have always been able to solve important problems, so this is nothing new, and there’s always been data, so this is nothing new. But what is new is the massive amounts of data we have on all aspects of our lives, from the micro to the macro. The data we have from government, finance, education, the environment, social welfare, health, entertainment, the internet will be used to make policy-decisions and to build products back into the fabric of our culture. I want you, my students, to be the ones doing it. I look around the classroom and see a group of thoughtful, intelligent people who want to do good, and are absolutely capable of doing it.
I don’t call myself a “data scientist”. I call myself a statistician. I refuse to be called a data scientist because as it’s currently used, it’s a meaningless, arbitrary marketing term. However, the existence of the term, and apparent “sexiness” of the profession draws attention to data and opens up opportunities. So we need Next-Gen Data Scientists. That’s you! Here’s what I mean when I say Next-Gen Data Scientist:
- Next-Gen Data Scientists have humility. They don’t lie about their credentials and they don’t spend most of their efforts on self-promotion.
- Next-Gen Data Scientists have integrity. Their work is not about trying to be “cool” or solving some “cool” problem. It’s about being a problem solver and finding simple, elegant solutions. (or complicated, if necessary)
- Next-Gen Data Scientists don’t try to impress with complicated algorithms and models that don’t work.
- Next-Gen Data Scientists spend a lot more time trying to get data into shape then anyone cares to admit.
- Next-Gen Data Scientists have the experience or education to actually know what they’re talking about. They’ve put their time in.
- Next-Gen Data Scientists are skeptical - skeptical about models themselves and how they can fail and the way they’re used or can be misused.
- Next-Gen Data Scientists make sure they know what they’re talking about before running around trying to show everyone else they exist.
- Next-Gen Data Scientsts have a variety of skills including coding, statistics, machine learning, visualization, communication, math.
- Next-Gen Data Scientists do enough Science to merit the word “Scientist”, someone who tests hypotheses and welcomes challenges and alternative theories.
- Next-Gen Data Scientists are solving a new breed of problem that surrounds the structure and exploration of data and the computational issues surrounding it.
- Next-Gen Data Scientists don’t find religion in tools, methods or academic departments. They are versatile and interdisciplinary.
- Next-Gen Data Scientists are highly skilled and ought to get paid well enough that they don’t have to worry too much about money
- Next-Gen Data Scientists don’t let money blind them to the point that their models are used for unethical purposes.
- Next-Gen Data Scientists seek out opportunities to solve problems of social value.
- Next-Gen Data Scientists understand the implications and consequences of the models they’re building.
- Next-Gen Data Scientists collaborate and cooperate.
- Next-Gen Data Scientists bring their humanity with them to problem solving, and algorithm/model-building.
[…] is crossposted from Rachel Schutt’s Columbiadatascience blog Data is information and is extremely powerful. Models and algorithms that use data can literally […]
Rachel, inspired by your leadership bringing together academics with a emerging field to promote responsibility and accountability. It’s sorely needed as a practitioner myself.
thank you!
Reblogged this on Data Science 101 and commented:
This is a great list of traits for the next generation of data scientists.
Excellent code of conduct. Note however that Gartner’ analysis of hundreds of job descriptions shows that the “data scientist” is not just marketing fluff. It is different than “statistician” and “BI analyst” roles: http://blogs.gartner.com/doug-laney/?p=230&preview=true. -Doug Laney, VP Research, Gartner, @doug_laney
[…] Next-Gen Data Scientists (Columbia University) - A list of desirable traits for the new field of Data Scientist…One of the items: “Next-Gen Data Scientists are highly skilled and ought to get paid well enough that they don’t have to worry too much about money.” Are you listening, boss? […]
I don’t think the leadership of most organizations hiring and managing data scientists are up to even understanding the ethical component of this group. Given that models can be very wrong in subtle ways, a business could be critically misdirected by its data scientists. That’s a big responsibility, especially for newly entrants to this field.
[…] Next-Gen Data Scientists. That’s you! Go out and do awesome things, use data to solve problems, have integrity and humility. […]
[…] highly recommend reading the posts from Rachel Schutt on this subject (October 4, 2024 · by Rachel Schutt · in Ethics and Humanity, Models) from […]
[…] wrote a blog for the class and had a great post about being a next-gen data scientist. She has high hopes for the students in the class and wrote an aspirational list for them. It […]
Some are born data scientists. They can process data scientifically, from day one. Data is information arriving to the brain via the nerves. The tools, data storage, and the analytic engine is the brain. Some will never be data scientist no matter how many years spent in university education.