Next-Gen Data Scientists

The following is a prologue to a discussion of what makes for a good data scientist.
Data is information and is extremely powerful. Models and algorithms that use data can literally change the world. Quantitatively-minded people have always been able to solve important problems, so this is nothing new, and there’s always been data, so this is nothing new. But what is new is the massive amounts of data we have on all aspects of our lives, from the micro to the macro. The data we have from government, finance, education, the environment, social welfare, health, entertainment, the internet will be used to make policy-decisions and to build products back into the fabric of our culture. I want you, my students, to be the ones doing it. I look around the classroom and see a group of thoughtful, intelligent people who want to do good, and are absolutely capable of doing it.

I don’t call myself a “data scientist”. I call myself a statistician. I refuse to be called a data scientist because as it’s currently used, it’s a meaningless, arbitrary marketing term. However, the existence of the term, and apparent “sexiness” of the profession draws attention to data and opens up opportunities. So we need Next-Gen Data Scientists. That’s you! Here’s what I mean when I say Next-Gen Data Scientist:

  • Next-Gen Data Scientists have humility. They don’t lie about their credentials and they don’t spend most of their efforts on self-promotion.
  • Next-Gen Data Scientists have integrity. Their work is not about trying to be “cool” or solving some “cool” problem. It’s about being a problem solver and finding simple, elegant solutions. (or complicated, if necessary)
  • Next-Gen Data Scientists don’t try to impress with complicated algorithms and models that don’t work.
  • Next-Gen Data Scientists spend a lot more time trying to get data into shape then anyone cares to admit.
  • Next-Gen Data Scientists have the experience or education to actually know what they’re talking about. They’ve put their time in.
  • Next-Gen Data Scientists are skeptical - skeptical about models themselves and how they can fail and the way they’re used or can be misused.
  • Next-Gen Data Scientists make sure they know what they’re talking about before running around trying to show everyone else they exist.
  • Next-Gen Data Scientsts have a variety of skills including coding, statistics, machine learning, visualization, communication, math.
  • Next-Gen Data Scientists do enough Science to merit the word “Scientist”, someone who tests hypotheses and welcomes challenges and alternative theories.
  • Next-Gen Data Scientists are solving a new breed of problem that surrounds the structure and exploration of data and the computational issues surrounding it.
  • Next-Gen Data Scientists don’t find religion in tools, methods or academic departments. They are versatile and interdisciplinary.
  • Next-Gen Data Scientists are highly skilled and ought to get paid well enough that they don’t have to worry too much about money
  • Next-Gen Data Scientists don’t let money blind them to the point that their models are used for unethical purposes.
  • Next-Gen Data Scientists seek out opportunities to solve problems of social value.
  • Next-Gen Data Scientists understand the implications and consequences of the models they’re building.
  • Next-Gen Data Scientists collaborate and cooperate.
  • Next-Gen Data Scientists bring their humanity with them to problem solving, and algorithm/model-building.

11 comments

  1. Next-Gen Data Scientists « mathbabe · · Reply

    […] is crossposted from Rachel Schutt’s Columbiadatascience blog Data is information and is extremely powerful. Models and algorithms that use data can literally […]

  2. Marc Rossen · · Reply

    Rachel, inspired by your leadership bringing together academics with a emerging field to promote responsibility and accountability. It’s sorely needed as a practitioner myself.

  3. Ryan Swanstrom · · Reply

    Reblogged this on Data Science 101 and commented:
    This is a great list of traits for the next generation of data scientists.

  4. Doug Laney · · Reply

    Excellent code of conduct. Note however that Gartner’ analysis of hundreds of job descriptions shows that the “data scientist” is not just marketing fluff. It is different than “statistician” and “BI analyst” roles: http://blogs.gartner.com/doug-laney/?p=230&preview=true. -Doug Laney, VP Research, Gartner, @doug_laney

  5. Link Roundup - October 15, 2024 | Enterprise Information Management in the 21st Century · · Reply

    […] Next-Gen Data Scientists (Columbia University) - A list of desirable traits for the new field of Data Scientist…One of the items: “Next-Gen Data Scientists are highly skilled and ought to get paid well enough that they don’t have to worry too much about money.” Are you listening, boss? […]

  6. AnalyticExec · · Reply

    I don’t think the leadership of most organizations hiring and managing data scientists are up to even understanding the ethical component of this group. Given that models can be very wrong in subtle ways, a business could be critically misdirected by its data scientists. That’s a big responsibility, especially for newly entrants to this field.

  7. Columbia Data Science course, week 14: Presentations « mathbabe · · Reply

    […] Next-Gen Data Scientists. That’s you! Go out and do awesome things, use data to solve problems, have integrity and humility. […]

  8. Data Science and Ethics « The Analytic Executive · · Reply

    […] highly recommend reading the posts from Rachel Schutt on this subject (October 4, 2024 · by Rachel Schutt · in Ethics and Humanity, Models) from […]

  9. Rachel Schutt speaks at Strata tomorrow about Next-Gen data science | mathbabe · · Reply

    […] wrote a blog for the class and had a great post about being a next-gen data scientist. She has high hopes for the students in the class and wrote an aspirational list for them. It […]

  10. Vincent Granville · · Reply

    Some are born data scientists. They can process data scientifically, from day one. Data is information arriving to the brain via the nerves. The tools, data storage, and the analytic engine is the brain. Some will never be data scientist no matter how many years spent in university education.

Leave a Reply

Fill in your details below or click an icon to log in:

You are commenting using your WordPress.com account. Log Out / Change )

You are commenting using your Twitter account. Log Out / Change )

You are commenting using your Facebook account. Log Out / Change )

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 435 other followers

Build a website with WordPress.com
%d bloggers like this: