The first data hires at an early stage startup face numerous challenges — an infrastructure built to run the business but not analyze it, an organization hungry for information without a process for requesting and prioritizing it, and little documentation on how anything is done. What should they do first?
I often get asked by junior data professionals how they can improve as data scientists. Today I will outline a generic framework for thinking about learning and provide a few concrete examples in support of it. These are tools that I still employ in my day to day learning and growing as a data professional.
I have spoken to many fellow analytics practitioners who are adament that they want their team to never touch “production.” While there are good reasons to be careful whenever you make changes that could impact customers, I believe that as software becomes more data-driven it is critical to find safe ways to empower Analytics teams to build and deploy data-driven applications.
Often, Data and Analytics teams go under-utilized in their organization because they can not collaborate effectively with the broader Technology and Software Engineering teams. By designing software following the “code as configuration” pattern, software engineers can enable and empower Analytics teams to work independently: both taking advantage of their technical skills and removing drudge-work responsibility from the Software Engineering team — a win-win.
A hypothetical tech company just completed an A/B test of two experiences, A (the test) and B (the control). The test was set up properly and executed successfully. The following dialogue is taking place between Diane the Data Scientist and Marty the Marketing Analyst at the conclusion of the test.
Everyone has their own reaction when discovering wrong data. It might start with a double take or maybe an itching feeling that the number should be a higher. However it starts, it usually leads to an investigation to discover what went wrong. While this is a very normal reaction, I offer an alternative. Before turning over every stone in your ETL, ask a few questions to discover if your “wrong” data really is wrong. In this post I explore what wrong means when it comes to data (spoiler alert: it is not black and white). I also offer a few tricks to diagnose which of the buckets of wrong your problem falls into. Yes, this approach may add an extra step or two in your process, but it can also save a day of work trying to fix something that isn’t even broken.
Anyone who has worked in digital analytics will tell you that day over day performance can be volatile. Shifts in marketing mix can cause fluctuating e-commerce conversion rates, new feature launches can lead to sudden and temporary swings in engagement rates and onsite bugs can result in anomalies in abandonment rates. Some of these scenarios can be diagnosed through extensive segmentation of data. Others, like a dropped analytics snippet or a bug with your payment processor, cannot be so easily uncovered. The simplest thing to do when events like these take place is to take a mental note and count on your memory for when you inevitably have to revisit that data in the future. Unfortunately, taking a mental note isn’t a scalable solution. While it’s not the most thrilling task for a data team, keeping a record of the online and offline events that affect your business is a practice that is well worth the (small) time investment.
Imagine you hit a roadblock while trying to tackle a complex piece of analysis, using a python function or designing your first data organization. What do you do? Of course you start with an internet search, but what do you do when you’re really stuck? I like to phone a friend. In this post I explore my favorite learning style – learning from others – and the steps to building your own analytics brain trust. I have used this approach to solve many challenges (including building an Analytics team from the ground up) and I believe it can be almost universally applied.
There has been a lot of discussion in the data science community about the use of black-box models, and there is lots of really fascinating ongoing research into methods, algorithms, and tools to help data scientists better introspect their models. While those discussions and that research are important, in this post I discuss the macro-framework I use for evaluating how black the box can be for a prediction product.
The sprint prioritization meeting is integral to the agile process. While many people may be more familiar with meetings such as sprint planning, stand up, back log grooming, and retro, the sprint prioritization meeting often receives less attention. I suspect this is because sprint prioritization is a particularly difficult process to deploy successfully. A good prioritization process requires thoughtful ticket descriptions written in advance, a collaborative review of each ticket in the context of all of the other tickets, and the buy-in and coordination of all of the analytics stakeholders. To top it all off, you have to squeeze this process into the end of each sprint, in advance of sprint planning… There is a reason why scrum masters are typically referred to as cat herders.