Next Semester: Applied Data Science (Statistics W4249)

Next semester, Ian Langmore and I will be offering a new course, Applied Data Science, in the Department of Statistics:

Short description

Data scientists wear many caps. This course presents two from opposite ends of the spectrum. Coding best practices will be taught using test-driven development, version control, and collaboration. The Python programming language will be used. Students finish the class with a portfolio on GitHub, and an understanding of several core statistical/machine-learning algorithms. Case studies give students the opportunity to use this software with real world data sets. Here they develop intuition for extracting meaning from data. Students finish the class with a wordpress portfolio.

Full description

The explosion of available data coinciding with the continued evolution of statistical and computational methods has resulted in a new breed of specialist. These data scientists use rigorous statistical methods to find meaning in data. Minimizing a loss function is not enough: Business and societal decisions hinge on the interpretation of these insights. The world of scientific computation is rapidly evolving. Quick-and-dirty scripts are not enough: A maintainable code base and collaborative development environment allows projects to productionalize and scale. A data scientist must wear many caps, we present two of them here.

Maintainable coding techniques will be taught using test-driven-development, version control, and collaboration. Code will be of the type found in the scikit-learn andstatsmodels packages. Students finish the class with a portfolio on GitHub, and an understanding of several core statistical/machine-learning algorithms.

Case studies give students the opportunity to use these their own software on real world data sets. Here they develop intuition for extracting meaning from data. Students finish the class with a wordpress portfolio, and experience with the translation:

Real world -> data -> scientist -> collaborators/coworkers -> policy-decision/data-product

Students enrolled in Statistics 4242 this semester are encouraged to enroll — this new course will be sufficiently different.

For more information, see Ian’s website here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

You are commenting using your Twitter account. Log Out / Change )

You are commenting using your Facebook account. Log Out / Change )

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 422 other followers

Build a website with WordPress.com
%d bloggers like this: