As the advanced and connected technologies of Industry 4.0 make their way into our daily operations, companies should consider how the use of Data Science can help them explain the unknown, optimize their business and predict the future. In anticipation of Soothsayer’s Data Science panel discussion and presentation at Automation Alley’s Integr8 conference on Nov. 14, here are 12 questions to consider before embarking on a Data Science journey.
Q1: How is Data Science different than Business Intelligence & Statistics?
A: Business Intelligence provides a view of the past, and traditional statistics (in the context of Analytics) use simplified models (such as Logistic Regression) on samples of past data to give estimates of the future. The fundamental difference between that and Data Science, also known as Advanced Analytics, is the use of non-parametric, Machine Learning-based models that evolve automatically from data, and that have the ability to self-learn and adapt when presented with new inputs.
This often translates to deeper insights, higher accuracy predictions, and in many cases the ability to positively perturb the future. Data Scientists are also not limited to structured data (i.e. rows & columns), which enables new sources of knowledge to be tapped for business value (e.g. text, images, and audio).
Q2: What does the board want to know about Data Science, Analytics, and Big Data?
A: Doing your homework is a big must when internally selling a new path forward. Prior to speaking with leadership, meet with relevant colleagues and stakeholders about issues and opportunities that they would like to address. Once you have some ideas, work with your data team to identify relevant data (that is preferably clean and comprehensive).
When speaking to the C-suite or Board, start by clearly defining where you are currently at in in the Analytics journey – are you currently at descriptive, diagnostic, predictive, or prescriptive? Gartner has a nice infographic for this, which essentially amounts to: “What happened, why did it happen, what will happen, and what should we do about it?”
Present your list of opportunities, and if possible, draw on examples of how competitors or adjacent industries did something similar – do not forget to talk about the value it created for them. For example, if you are starting an analytics initiative at a telecom provider, talk about how a competitor tapped into their stream of customer data to reduce churn, resulting in a savings of $10 million per year.
Though it runs counter to the aforementioned example – try to start with problems that will increase revenue as opposed to decrease cost. Making money is often seen as sexier than saving money.
Q3: How do I narrow in on a first problem to solve with Data Science, and how long do these projects take?
A: Quick, meaningful wins can typically be achieved within a few months. In order to minimize the time it takes to garner a return on investment, begin by identifying low hanging fruit where you already have access to quality data – it may be on customers, processes, or even in the form of unstructured data such as text.
If possible, start with a larger problem that can be easily split into sub-projects. For instance, if you decide to focus on customer understanding, begin with scientific customer segmentation. Once you know who your customers are, you can start to solve other problems such as identifying opportunities for cross-sell and upsell, predicting and preventing churn, and forecasting customer lifetime value.
Q4: What are my internal/external resource needs?
A: Whether you are building internal capabilities or working with external partners, it is important that your data science team includes some combination of programming skills, math & stats knowledge, and domain expertise. Since you are unlikely to find someone who fits all three, and due to the comparatively high cost of hiring data scientists internally, it often makes sense for a company to provide the domain expertise and to rely on external partners to provide the other skill sets.
Q5: What will be the impact on our internal staff?
Assuming that you are working with an external partner, the impact on your internal staff should usually be minimal. Most engagements will start with a Business & Data Understanding phase. During this time, your data science partner will work with your domain experts to elucidate project objectives and requirements, and to formulate an initial project plan. Depending on the size and scope of the project, this may consist of a few hour-long conversations with key stakeholders, or it may consist of several discussions over a period of a few weeks. After this, a weekly touch base will usually suffice.
Q6: How much training will my team need to use the models that are developed?
Depending on the requirements and technical skills of your internal team, there may also need to be time allocated for training. If you do not already have an internal data scientist, it is important to partner with vendors that can either work with you to integrate their deliverables into your existing BI systems, or other such applications, or that can build stand-alone tools that shield the complexity from non-technical users.
Q7: This is a lot of time and money, what’s the payback?
A: The payback is dependent on a company’s current level of analytics maturity, the problem being solved, and how invested – both financially and strategically – stakeholders are.
Reputable studies say that companies average a 13-to-1 return on investing in analytics. That figure appears correct, though, we have seen cases in excess of 40-to-1 ROI at Soothsayer.
Q8: How do I frame this in terms of ROI, so that leadership does not get sticker shock?
A: It is often difficult to calculate potential ROI of a data science project before digging into the data. Many times, a project can result in tertiary insight that is valuable far beyond the initial problem statement. If a company is new into Analytics, it may be that the ROI generated from an initial engagement opens the door to new possibilities and visibility into previously unknown aspects of their business.
Broadly speaking, you can calculate ROI by quantifying how a problem is currently being solved versus what can be achieved if a data science approach is implemented instead. For example, assume you are a distributor of perishable products. If your company’s current method for forecasting regularly results in 10% more inventory than required, you can easily map each percent of accuracy improvement to a significant savings. Because you will be able to better anticipate customer demand, you will also likely be able to more effectively satisfy their needs and improve your loyalty metrics.
Q9: How do I set proper expectations?
A: It is important to set clear expectations. Much of what a Data Scientist does involves identifying, collecting, cleaning, and compiling clean data into usable formats – in many cases, this can take up as much as 60% of their time. It may take a month or two for any snippets of insight, so stakeholders need to be patient. If the initial work is foundational, make sure they understand that future initiatives X & Y are dependent on the success of this first step.
Q10: How do I articulate the results of our first Data Science project?
A: Always start with the most interesting insights and all of the opportunities that they present. The focus here should be on what actions you can take and how those actions will change the business for the better.
It is important that whoever is enlisted internally for domain knowledge also has strong communication skills, otherwise it may be hard for non-technical leaders to understand the value of your deliverables. Make sure to also provide rich visualizations where possible. It is easier to understand a graphic than an equation, so remember the KISS principle.
Q11: What technologies should we use for Data Science? Who are the vendors that can satisfy these needs?
A: There are many off-the-shelf tools designed for self-service “analytics” by relatively non-technical users. The issue with such tools, however, is that they often fit your data to their solution, as opposed to fitting their solution to your data – they also often come attached with significant financial strings.
Because of this, we usually recommend starting by exploring powerful, open-source tools such as R and Python – neither requires licensing fees, and they tend to be several months, or years, ahead in terms of the sophistication of algorithms. These tools also have great and very active online communities with quality tutorials that help lower the barrier to entry to these technologies. Both also provide native visualizations, though this is an area where open source is still a bit behind. Products such as Tableau, Qlik, and Power BI lower the barrier of entry to rich, aesthetically pleasing, and interactive visualizations.
Q12: How do we integrate what we build into our existing workflow, and how do we protect privacy?
A: Flexibility is key, but one frequent approach is to expose the model as a web service. This makes consuming the model a fairly standard procedure. This is not unique to data science, and similar methods of integration are likely already defined internally.
Depending on the problem you are solving, concerns about customer privacy may come up. as Again, this problem is not specific to Data Science. In general, you will follow the same traditional data security processes that you would with any other application.