Next semester, Ian Langmore and I will be offering a new course, Applied Data Science, in the Department of Statistics:
Short description
Data scientists wear many caps. This course presents two from opposite ends of the spectrum. Coding best practices will be taught using test-driven development, version control, and collaboration. The Python programming language will be used. Students finish the class with a portfolio on GitHub, and an understanding of several core statistical/machine-learning algorithms. Case studies give students the opportunity to use this software with real world data sets. Here they develop intuition for extracting meaning from data. Students finish the class with a wordpress portfolio.
Full description
The explosion of available data coinciding with the continued evolution of statistical and computational methods has resulted in a new breed of specialist. These data scientists use rigorous statistical methods to find meaning in data. Minimizing a loss function is not enough: Business and societal decisions hinge on the interpretation of these insights. The world of scientific computation is rapidly evolving. Quick-and-dirty scripts are not enough: A maintainable code base and collaborative development environment allows projects to productionalize and scale. A data scientist must wear many caps, we present two of them here.
Maintainable coding techniques will be taught using test-driven-development, version control, and collaboration. Code will be of the type found in the scikit-learn andstatsmodels packages. Students finish the class with a portfolio on GitHub, and an understanding of several core statistical/machine-learning algorithms.
Case studies give students the opportunity to use these their own software on real world data sets. Here they develop intuition for extracting meaning from data. Students finish the class with a wordpress portfolio, and experience with the translation:
Real world -> data -> scientist -> collaborators/coworkers -> policy-decision/data-product
Students enrolled in Statistics 4242 this semester are encouraged to enroll — this new course will be sufficiently different.
For more information, see Ian’s website here.