I often get asked by junior data professionals how they can improve as data scientists. Today I will outline a generic framework for thinking about learning and provide a few concrete examples in support of it. These are tools that I still employ in my day to day learning and growing as a data professional.
A lot of Data Science literature focuses on the mathematical and algorithmic aspects of model building. This is after all what much of academia spends time grappling with. However the Data practitioner understands that applying Data Science and Machine Learning models to solve real-world problems involves much more than coding a statistical formula. Today’s post will be about keeping your models fresh and up to date, and your team informed as your data world evolves. We will discuss some implications of a changing data distribution on your model, practical technical considerations when building a model that is integrated with the product application, and how presenting to your team can be a great checkpoint on your model building progress.
Many Data Scientists come from a hard science background - statistics, math, physics. Hard sciences have a bias towards empirical and objective truths: a correct answer exists and we can find it by employing the scientific method, usually manifested by a formulaic approach to solving the problem at hand. While not a controversial statement in itself, many years of studying and application of such a paradigm can collide with the practical realities of the business world. In that world, it becomes increasingly difficult to perfectly apply the theory. As a result, the practitioner should understand how to adjust their model and their approach accordingly.