Optimizing payment reminders with SAP Predictive Analytics (3) – Analyzing and using the predictive model
In our last Blog Ticket, we introduced the dunning process and how we intended to optimize the payment reminders. Our goal was to detect which customers would pay their due bills before receiving the dunning letter. We prepared the data and created (or trained) our first classification/regression model with SAP Predictive Analytics on historical data. This is how the magic formula looks like:
Analyzing the model
Let’s now analyze the model which we created. It’s first of all interesting to have a quick look at the overview of our model: we can observe that the predictive model was created in 25 minutes thanks to our data in SAP HANA Views. 2 % of the dataset are cross-payers (bills paid before the dunning letter was received). SAP Predictive Analytics kept 21 variables in our model out of the 200 that we previously selected as independent variables.
Moreover we obtain two very interesting Key Indicators:
- The Predictive Power of our model (KI): which tells us how good our model is on a scale from 0 to 1. Here we have a score of 0.4932 which is better than a purely random model but still not a very strong model.
- The Predictive Robustness of our model (KR): which tells us how reliable our model is, which is if we are going to be able to use it on a new dataset (it’s not recommend to use a model having a score where KR < 0.95). Here we have a score of 0.9756 which means our model is robust.
As you can see on the ROI curve, our model (the blue curve) is significantly better than a purely random model (the red curve) with the little data we prepared. To get closer to a perfect model (the green one, where all targets are predicted correctly – which is very unlikely to be created), we would try to add relevant data (such as calendar holidays).
Another very interesting analysis is the variable contribution. The graph is sorted and gives us the impact of the independent variables on the target variable. In our model, we realize that the amount of the due bill or the day of the week on which the bill is due have a high impact on the target. The age of the customer is also significant but less than the previous variables. SAP automatically removed some of the selected independent variables from the model, either because they have a very low impact on the target variable or because they are highly correlated to another variable in the model (this is a great feature proposed by SAP).
SAP Predictive Analytics enables us to drilldown the variables and have a look at how the values of the variables have an impact on the target. For the age of the customer, we observe than the interval 21 – 24 has a higher probability to have cross-payer behaviors whereas the customers above 51 a very low one. SAP Predictive Analytics automatically created those categories which is a very nice feature (the tool can also cope with empty values which is usually quite tough to handle in predictive modeling).
We also can have a look at the decision tree, where we see the probability of finding our target (cross-payers) per variable values within a same level of the tree. We have the possibility to drill-down and go to next level of the tree (the next most relevant variable).
Another nice feature is to have a look at the confusion matrix (don’t worry, it’s not that confusing!). SAP Predictive Analytics cuts the training data set in two parts. 75% of the data was used to estimate the model, and 25% to validate it. The confusion matrix allows us to see which amount of those 25% where correctly identified by the created model (also called true-positives and true-negatives) and which cases were confused (or wrongly identified). The confusions are called false-negatives and false-positives.
OK, so SAP has been using ridge regressions in the background during the 25 minutes for the creation of the model but it’s not a complete black-box at all. We are able to analyze the model and understand a lot what is going on. The variable contribution is usually very interesting to discuss with the business in order to understand customer behaviors, data preparation gaps, data quality errors and more.
Using the model
It took us slightly more time to analyze the model than to create it but we could have used it directly. Just looking at the KI and KR indicators is probably not the best idea as you want to make sure that the variables and model sound correct.
Once you’re happy with the model, you can save it in order to re-use it later. SAP Predictive Analytics also allows to export the code of the model in various languages (PMML, adapted SQL, C, etc.) which can be very convenient to embed predictive analytics in an application.
Once the model has been created, it can be used on a new set of data (the new data has the same structure as the historical dataset; the only difference is that the target is unknown – this is what we want to predict). The new formula looks like this:
Applying the model on a new dataset give us a score and a probability per entity. In our case, we have a probability per invoice. The higer the score is, the higher the probability is that the invoice will be cross-paid.
The predictive scores can be exported as a flat file or directly written back in a Database. This feature is very interesting as we can imagine writing the results back in a CRM or in a Datawarehouse. This would allow developing dashboards or other reports that combine classical Business Intelligence data and Predictive values.
Here’s is what some scores look like for our dunning model:
- The Business Question is key to obtain good predictions. In our case, identifying scores per customer or per invoice is significantly different and the scores would have a different meaning.
- The Data Preparation part is time-intensive compared to the creation of the predictive model
- SAP Predictive Analytics Automated allows to very quickly create and apply models, without a need of big statistical skills
- SAP Predictive Analytics also embeds other tools which can help for the data preparation and the management of the predictive models in order to automate most of the work and facilitate the administration of the models.
- Bringing more data in the models will usually improve the models, which requires more time and eventually open the doors to Big Data projects.
- A predictive process needs to be iterative: the models need to be measured, improved, and re-calibrated as behaviors change.
- It is always a good idea to back-test predictive models and use control groups (comparing a campaign with and without the predictions for example) in order to measure the benefit of predictive analytics