Optimizing payment reminders with SAP Predictive Analytics (2) – Preparing and building the predictive model

Describing the use case

As explained last week (Predictive Analytics with SAP (1)), we have been working on a predictive proof of concept for the dunning activities at Swisscom. Oliver Mihatovic, in charge of payment reminders for residential customers at Swisscom, kindly explained us the dunning process and strategies and supported us actively during this use case. The dunning process is complex due to various aspects such as the nature of the customer, his/her loyalty, the type of products subscribed, his/her history regarding dunning activities and Oliver Mihatovic is therefore in charge of several dunning strategies.

We decided to start simple for this predictive case and concentrated on the following assumptions:

  • A customer regularly receives a bill for his/her Swisscom products (Phone, Internet, TV) and has to clear the bill before a certain due date.
  • If the payment is received before the due date, the bill is cleared and no dunning activity is required.
  • If the payment hasn’t been received x days after the due date, a first warning is sent by SMS to the customer as a gentle reminder. This is the first level of the dunning strategy.
  • If y days after the SMS was sent, the payment still hasn’t been received, a dunning letter is sent to customer (the next level of the dunning strategy). The dunning letter warns the customer that he/she still has an open bill to pay before a new due date. If the customer has already been dunned in the past, an additional amount will be charged to the due bill. Here’s an example of what a dunning letter looks like and what key information is highlighted:

                              Dunning_letter1      Dunning_letter2

  • Z Days after the dunning letter was sent and still no payment has been received in the meantime, the dunning process goes to higher dunning strategy levels (recovery plan proposals, blocking of accounts, legal proceedings, etc.)..


Identifying the business question and potential of the predictive case

We then focused on the business question we were to respond to in order to find a predictive outcome. We started wondering how a customer could react in regards to the payment of his bill(s), depending on what warnings he/she has received and analyzed which step of the process could be optimized. Given that up to 300’000 dunning letters can be sent per month, we came to the following business question which would be the basis for the predictive case : 

=> Which customers already paid their due bills before receiving the dunning letter?

This population will be called the “cross-payers” and is marked in green in the following figure. The non-cross-payers are the customers who still haven’t paid their bills once the dunning letter has been received. If we are able to identify these population with a predictive model, we would definitely be able to optimize the dunning process. This would lead to savings in terms of cost of dunning letters sent (only sending dunning letters to the non-cross-payers) and time to analyze customer’s interaction with the Swisscom call center (a cross-payer receiving an additional amount to pay with his dunning letter will very probably call back to prove he has paid his bill before receiving the dunning letter).


Defining the target and independent variables

Well, let me tell you that we have actually just described the target variable of our predictive model with the previous schema. The target variable is binary and our cross-payers will be marked as “1” (paid their due bill(s) before receiving the dunning letter) and the non-cross-payers marked as “0” (still haven’t paid their due bill(s) after receiving the dunning letters). Piece of cake isn’t it?

Defining such a target variable means we will be using a classification/regression model which is all you need to know for the moment.

If you remember what was said in my previous Blog Ticket, the predictive model will help us find the relationship between data (the independent variables) and the target variable we have just defined. The independent variables has actually been mentioned before: any data which might seem relevant such as customer master data (age group, type of products contracted, geographical location, etc.) and transactional data (dunning history, usage of Swisscom products, amount of bills, etc.) will possibly help our model find the relevant relationships.

Preparing the data

OK, so now it’s time to prepare our data to create our predictive model. There are several methods of doing so and I don’t want to mess your minds up here. We used an extract of the mentioned data from our key tables in our ERP system (if a data warehouse is available you definitely want to consider using the prepared data rather than the raw data) and loaded the tables in a SAP HANA database. We then joined the tables in a main view in order to have one entry per customer. The columns of the view contain the variable information (age group, products, number of calls in the last 2 months, amount of open bills, number of dunning letters sent, etc.) per customer.


Time is key in predictive analytics and we will actually will need two views:

  1. One data set to create and train our predictive model: this is the “learning from historical data” part and actually concerns the data we have prepared previously. We just need to make sure we have sufficient positive cases (Cross-payers / target variable = 1) in order to learn from and create a good predictive model. We obviously always have more positive cases than negative ones (if we send more dunning letters when not necessary than when necessary, we definitely have a problem…). This is the “past focus”.
  2. One data set to apply our model on. This is the “predict the future outcomes” part and the dataset will look exactly like the training set except that we won’t know the results of the target variable (will the customer pay his bill before receiving the dunning letter or not?). This is the “future focus”.

Creating our predictive model

We now have the data and can know start SAP Predictive Analytics – Automated and create our first predictive model. You’ll be bluffed how easy this all is:

  • We choose to create a new Classification/Regression model:


  • We choose our data set to learn from (the past focus). SAP Predictive Analytics allows us to choose a flat file (Excel, Csv, etc.) or a database to connect to. In this case we loaded our data in SAP HANA in order to benefit from the stunning performance of the in-memory processing database but you can connect to plenty of other databases.


  • We read the data and make sure the relevant data is correctly formatted (texts, strings, etc.):


  • We can now choose our target variable, eventually exclude some variables we do not want to have in the model and keep all the rest as independent variables:


  • We now launch the process and leave SAP Predictive Analytics create the model. In less than a minute, we have already created our first classification/regression model:

                           sap_pa5          sap_pa6

This is about it for this now. You should now basically have understood what predictive analytics is about and how to prepare the context (use case and data) in order to create a first predictive model with SAP Predictive Analytics. The next article will be the interesting part: we will analyze the model we just created (how good is our model, which variables have a high impact on the target variable, how could we enhance it?) and think about how to really use this model and be able to test its’ accuracy (as I’m sure a few of you are still skeptical regarding the outcomes)…. See you soon!