Identify a specific business problem
Data Science works extremely well, if there is a well defined problem to solve. When questions or problems are vague, it takes much longer to solve and often the answers are also vague. An example of a vague question would be: “How do I increase revenue?” This problem is not unsolvable, but it would take considerable time in understanding all the variables affecting revenue in order to provide an informed answer.
A better question or more specific question would be “How do I leverage my existing client base to achieve increased sales?” This question better defines the scope of the problem, because it limits the number of variables at play. In the previous question one would also have to look at industry and possibly macro-economic data to make an informed decision. In the latter question the problem is defined around the buying patterns of existing clients, and whether there is an opportunity to increase the frequency of the buying cycle.
An even better question would to pose the following problem “Which clients represent the best opportunity to place orders twice a week instead of once a week? What will increased sales volume be, if we capitalise on the top 10%?” This question is very specific and much easier to solve than the two previous questions.
In conclusion: Understand the root cause of the problem to ensure the correct problem is identified.
Preparing the data
Preparing the data consists of two steps,namely: Getting the data and cleaning the data. Getting the data may be as simple as receiving an email with a .csv attachment, but can get as complex streaming live data from multiple servers at the same time. Normally more than one data source is needed to solve a specific problem. In example, to solve the question in the previous section, we will most likely need sales history, sales orders and a client master. In solving more complex problems like quantifying theoretical waste, we could use the bill of materials in conjunction with production throughput, quality assurance data and inventory movements to calculate the theoretical waste number. In this problem data is drawn from different business units to quantify a specific business problem.
Autolytix Data Science have the ability to integrate with various databases and data warehouses, whether it is SAP,SQL Server, Orable DB, MySQL, Mongo DB, HBase, Cassandra or other data source.
The next step in the process is to clean the data. Data cleaning is an essential step prior to analysing the data. There are two very specific steps that we follow to clean the data. First we need to eliminate randomness. Random data is very difficult analyse especially if we are trying to identify patterns in the data. To eliminate randomness we have developed certain algorithms and machine learning concepts that reads and understands data in its base mathematical form. We use these algorithms and machine learning processes to reclassify data, and thus eliminating randomness.
The next step to enhance comparability is to develop data derivatives. A data derivative is a data field derived from another field, in example: a date of 2015-01-01, can have a year derivative of “2015”, or a Toyota Hilux 3.0 D-4D can have a brand derivative of “Toyota”, or a vehicle class derivative equal to “mid-size pickup”. The number of comparisons exponentially grows for every data derivative added to the data set. The formula used to quantify the number of combinations or comparisons is shown below. You can learn more on combinations by clicking here:
Our intention with eliminating randomness is to achieve comparability. Comparability enables us to measure two data points against each other, or on a larger scale making billions of comparisons between data points. We often use this simple example in explaining comparability. Let’s say your Toyota Hilux has a fuel consumption rate of 8.5 litres per 100 kilometres. By viewing this observation in isolation you have not gained any insight, but when we compare it to your brother’s Ford Ranger that has a fuel consumption rate of 9.00 litres per 100 kilometres, suddenly we have appreciation for the Hilux’s better fuel consumption. The more comparisons we make the more answers we get. If we continued with this exercise and compared a million vehicles of the same vehicle class (same size and type) we will achieve a high level of insight.
At this point most companies deploy dashboard tools to aid the analysis of the data in an effort to gain insight. At Autolytix we believe that this is very pre-mature. The reason is that there are just to many combinations for the human mind to comprehend. Dashboards are visual ways to make data comparisons. The flaw in it is that people believe they will gain insight by clicking through the dashboard. Our solutions was to develop a mass comparative algorithm to generate all the comparisons in one go. From this we can start analysing the data.
Analysing the data
Some refer to this step as explorative analysis. In this step we use the clean data from the previous step and start analysing the specific business problem. In this step we will go through different iterations in finding the right answer. In the analysis process, we will use various plots and diagrams to understand causality. If we focus on understanding the cause of the problem, it will be easier to explain why it is occurring and how to rectify it.
The objective here is to understand the cause, explain it, look at various ways of changing it and extrapolate it into the future. So what have we done? We have answered the following question: “What do I need to do now, to change the future?”
Some refer to this step as reproducible research. We call it automated intelligence. There are a few reasons for this. Firstly, this is the step where we automate the data acquisition, cleaning and analysis, so that we can produce the same result on demand. Furthermore, the outcome of the analysis might also be displayed on a dashboard or other visual media, or it could become a standard report in the company or it can form part of the CEOs board pack or management accounts or it might form part of another application necessary to generate additional data.
During this step of the process, we will assess the client’s needs in terms of disseminating the results and how it may influence other parts of the business. The essence here is to automate the process, in whatever form the results are communicated.
Develop a strategic initiative
In this step we will develop a strategic initiative to implement the results of the analysis. The strategic initiative should have three outcomes, namely: Realisation of the company’s strategic intent (or vision), benefits for strategic stakeholders and transformation of the organisation. In crafting the strategic initiative we will answer the following questions: Does it form part of our strategic intent or vision? Who will benefit from this initiative? And, how will we transform the organisation to ensure longevity? This exercise will form the crux of how the plan will be executed. The outcome of the strategic initiative can be anything. It could be an official company policy. It might lead to re-engineering a business process. Raising capex to purchase a specific asset. Launching a new marketing campaign.
The strategic initiative will set the tone for execution.
The objective here is to take the strategic initiative and break it down into specific tasks, with responsibilities and time lines. This is the execution plan. Here all the factors at play will be assessed and planned. In example: the budget for the project, the resources required for the project, the timespan of the project, the constraints standing in the way of executing the plan and any other relevant aspect that need to be considered for the project plan.
After successfully implementing the project, the actual results form the strategic initiative will be measured, to assess whether it had the desired impact on the business or not. Impact quantification here is essential to understand the success of the initiative. If it was a sales initiative, sales will be tracked over time and measured against the expected result.