There has been a lot of discussion in the data science community about the use of black-box models, and there is lots of really fascinating ongoing research into methods, algorithms, and tools to help data scientists better introspect their models. While those discussions and that research are important, in this post I discuss the macro-framework I use for evaluating how black the box can be for a prediction product.
The sprint prioritization meeting is integral to the agile process. While many people may be more familiar with meetings such as sprint planning, stand up, back log grooming, and retro, the sprint prioritization meeting often receives less attention. I suspect this is because sprint prioritization is a particularly difficult process to deploy successfully. A good prioritization process requires thoughtful ticket descriptions written in advance, a collaborative review of each ticket in the context of all of the other tickets, and the buy-in and coordination of all of the analytics stakeholders. To top it all off, you have to squeeze this process into the end of each sprint, in advance of sprint planning… There is a reason why scrum masters are typically referred to as cat herders.
Poor communication within an Analytics team and between that team and the rest of the company, leaves highly skilled Analysts solving the wrong questions, lacking support for big ideas and and ultimately departing the company unfulfilled by their work. In this post I will discuss ways a team can improve performance and employee satisfaction by focusing on constructive conversations.
A web analytics implementation project often starts with quite a lot of fanfare and resources. There will usually be an audit and needs assessment process to determine what tracking needs to be implemented or fixed, an implementation project plan identifying task owners and dates, and earmarked hours from the development team for tasks like implementing tracking code and building a data layer. All of this generally ensures that there is satisfactorily comprehensive and accurate tracking in place at the end of the project. So why do we still regularly see web analytics issues?
Agile software engineering practices have become the standard work management tool for modern software development teams. Are these techniques applicable to analytics, or is the nature of research prohibitively distinct from the nature of engineering? In this post I discuss some adjustments to the scrum methodology to make the process work better for Analytics and Data Science teams.
A data warehouse Service Level Agreement (SLA) is an important building block for a data-driven organization. To help get you started, in part one I introduced a data warehouse SLA template - a letter addressed to your stakeholders. In this post I walk through the meat of the SLA template: services provided, expected performance, problem reporting, response time, monitoring processes, issue communication and stakeholder commitment. If you have not already read part one, I highly recommend reading it first!
This is part 2 of my 3 part exploration of the following question: are Agile engineering practices applicable to analytics, or is the nature of research prohibitively distinct from the nature of engineering? For the agile fans, in part 1 I gave an intro to agile and talked through what I like about the scrum development process for analytics. For the agile nay-sayers, in this post I explore the elements of agile that do not work particularly well with Analytics (issues range from annoyance to downright incompatibility).
The tools and techniques of data science and advanced analytics can be used to solve many problems. In some cases – self-driving cars, face recognition, machine translation – those technologies make tasks possible that previously were impossible to automate. That is an amazing, transformative accomplishment. But I want to sing a paean to a mundane but important aspect of data science – the ability to intelligently put lists of things in a better order. For many organizations, once you have found some insights, and are into the realm of putting data products into production, the most substantial value can be found by identifying inefficient processes and making them efficient. Twenty or thirty years ago, that efficiency-gain might have been addressed by converting a paper-based process to a computer-based process. But now, prioritization – putting things in the right order – can be what it takes to make an impact.
A lot of Data Science literature focuses on the mathematical and algorithmic aspects of model building. This is after all what much of academia spends time grappling with. However the Data practitioner understands that applying Data Science and Machine Learning models to solve real-world problems involves much more than coding a statistical formula. Today’s post will be about keeping your models fresh and up to date, and your team informed as your data world evolves. We will discuss some implications of a changing data distribution on your model, practical technical considerations when building a model that is integrated with the product application, and how presenting to your team can be a great checkpoint on your model building progress.
Yes, if you want to build a truly data-driven organization your data warehouse needs a Service Level Agreement (SLA). At the core of any data driven organization is trust - your stakeholders must trust that when they need data, it will be there and it will be accurate. Without trust in the data warehouse, your organization will be less likely to use data to drive decisions big and small. In my previous post Reporting is a Gateway Drug I explored reporting as a tool to build a trusting stakeholder relationship. In this post I explore trust through the concept of a data warehouse SLA. In part two I explore the people, process and tools you need to successfully implement the SLA.