An Empirical Machine Learning Approach in Forecasting Customer Lifetime Value

Abstract

With the rapid advancement over the technical infrastructure and exponential growth of data, machine learning is widely accepted into many business especially in predictive analytics area. But, majority of the business still had incomplete understanding over, actual possibilities of the Machine Learning. In this research paper, a case study on empirical machine learning based approach in predicting the lifetime value of the customer for a online retail is presented. The main purpose of this paper is to help the business in understanding the machine learning techniques, such as supervised algorithms and terminologies with respect to the business problem and importance of domain knowledge. It helps business to understand which algorithms are suitable with respect to problem and expected outcomes from it. Later, a set of recommendations is provided in designing customer centric marketing using machine learning.

Keywords: Machine Learning, Predictive analytics, Supervised Algorithms.

Introduction

For the past two decades, internet has revolutionized the way business used to function traditionally. There are numerous business outside, using internet as a means to sell products in online. Customers are center for any business. According to pareto principle 80/20 rule, It states that 80 percent of the revenue is created from 20 percent of the customers (Richard Koch, 2008).Customer Lifetime value is an important metric to measure success of any customer centric business. It also gives business people to understand the customer behavior like, to what extent they were interested in products and services of their respective company. It is an indication for firms to look in, how their business is performing.

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

With the evolution of internet, in the past decade online retail stores has grown enormously compared to well establish traditional stores. It not only changed how customers interact with the business, it also raised the bar and increased the sales exponentially in a short period. According to an article published by e-commerce news,clearly stated that in 2018, almost 87% of the purchases were made on online retail stores, Ecommerce news (2018).

Purpose

The following were some of the distinctive features that are associated with online retail compared to the traditional shopping at retail stores. It is easy for the business to track the customer’s behavior and activities instantly. The shopping transactional data includes, the kind of products that customer is interested and purchased. It also includes the billing address and payment address for each consumer. This type of data can really enable the business to design their promotion and campaign programs accordingly.

The common points that online retail business use to have is.

What products customers were interested in, and who are the most valuable customers for the business. Since business has all set of customers but customers who purchases more or who does high billing on their website generally treated as most valuable one.

Who are most likely to churn from their service and to what extent can business can retain those customers.

Who are most loyal and trustworthy customers to the business.

How much each customer is contributing to revenue of the company.

How likely a customer can respond to promotions. Understanding the Sales and demand forecasting.

The machine learning has gained wide acceptability by the industry to gain the valuable insights from their business data towards their customers. Yet, many new retailers were entering into online market but still many had an incomplete understanding of machine learning and its possibilities in developing customer centric marketing.

The main focus of this paper is to help the business in understanding the predictive power of machine learning algorithms especially in forecasting the customer lifetime value and also recommendations on how they can design and conduct the customer centric marketing in attaining with respect to profitability.

Literature Review

Customer Lifetime value is a measurement of long term relationship between a customer and the company. It provides an insight on, how a customer is important over certain period of time. Several methods had been proposed and implemented. There are wide range of models ranging from simple to complex models. These models are suggested from different disciplines such as economics, statistics, management science so, on. With the advancement of machine learning, majority of researchers had attempted in forecasting the value with less importance on the business understanding.

A studied attempted by (Sai Laing Sain, Kun Guo.,2012) had segmentented customers with an unsupervised approach for understanding the customer lifetime value. Their paper implemented modeling on the basis of RFM Model i.e Rececency, frequency, and Monetary Model. Before, modeling it is important to have thorough understanding over the statistical behavior of the data , Jean-leah Njoroge.,(2017).Their work were less supportive on statistical approach. Moreover., their research focussed without having of target variable. Data without target variable leads to unknown decision making in business world(Abbott Dean.,2012). It is highly recommendable to have target variable in order to get an unbiased decision making.

According to (Benjamin Paul Chamberlain et al.,(2017) their research work was focussed in predicting the lifetime value with an supervised technique. Their paper focussed on ensemble approach and neural network in forecasting customers who are likely to churn with regressors. It is unclear in their work why they arrived in picking up only two machine learning models. Since in machine learning there wide set of algorithms, it is important to pick up the models based on problem search (Sanjukta Bhowmick et.,al (2006)).

The customer value for an organization can be understood in three ways Customer Profitability, Customer Equity and Life-time value. Typically customer lifetime value can help the business, to segment the customers based on using of information and lifetime value components.(Su-Yeon Kim,Tae-Soo Jung b , Eui-Ho Suh c et.,al 2006). Their paper focussed majorly on theoretical management approach with less supportive to the data. In practical world, a good decision making requires data in decision making(Nan Maxwell et al.,2015). This current research paper attempts in how to choose right model for the problem and how to use machine learning to segment the data based on profitability for the business.

Dataset Description

Data Source: The data was readily available at UCI machine learning repository. It has all the transactional information of the customers that is available from December 1,2010 to December 9,2011. It belongs to United Kingdom based online retail store ones. The company mainly sells all unique occasion gifts. Most of the customers in the company were wholesalers.(UCI, Machine learning repository).

There is only one dataset that is available in our case, In order to build the an effective machine learning model we need to train and later test the model on unseen test data.

Here,we had one dataset, an Hold out technique can be applied in dividing the data set into 70,30 ratio where 70% of the data can be used for training the model and rest 30% can be used to test and evaluate the model.

The dataset contains 541,909 records of customer transactions with 7 columns.

The column field includes the following information in the transaction dataset.

Feature Name

Data Type

Description

Invoice

Nominal

It is a 6-digit integral number uniquely assigned to each transaction. If this code

starts with letter ‘c’, it indicates a cancellation.

Stock Code

Nominal

It is a 5 digit Uniquely assigned number onto Product.

Description

Nominal

Product (item) name.

Quantity

Numeric

Quantities of each product (item) per transaction

Unit Price

Numeric

Product price per unit in sterling

Invoice Date

Numeric

The day and time when each transaction was generated.

Customer ID

Nominal

It is a a 5-digit integral number uniquely assigned to each customer

Country

Nominal

Country name, It is the name of the country where each customer resides.

The dataset has 135,080 missing values which is nearly 25% of data were missing,which is a huge drawback for the data. There are duplicate values in the dataset which needs to be treated with special attention.

The Main goal here is to predict the lifetime value of the customers using machine learning. Using Data analytics helps in understanding the features in solving the problem of our study and build the machine learning model in predicting the lifetime value of the customers with low error for an unseen data. Later, it helps in interpreting results with respect to the business problem in understanding the customers profitable relation with business. In order to proceed with the above defined steps, it is important to get a thorough understanding over the business problem.

Research Design and Methods:

This study initially focuses in forecasting the lifetime value of the customers.Since, we don’t have the target feature, it is essential to design the target feature.

According to study made by Dwyer, F. R. (1997), the customer lifetime value is calculated by using below formula,

CLTV = ((A.O x P.F)/C.R) x P.M

Where,

A.O = Average Value of the Order

P.F = Purchase Frequency

C.R = Churn Rate

P.M = Profit Margin

There exists no target label for the dataset. The target feature can be designed by using the above formula.

While proceeding further in designing the target feature, we need three basic values

Avg. Value of the order

Purchase Frequency

Churn rate

An Avg.value can be calculated by using Money spent by each customer over total transaction made by the customer.

Avg.Order value = Expenditure by each customer / Total Transactions

Money spent on each customer = Quantity X Unit Price.

The total transactions value till date can be deduced from the actual transaction date.

According to a study done by the courtney Regan( 2019), the web-only retailers experiences 30% marginal profit each year, for online in store pickup experience 20%.

Let’s consider in our case as ten percent marginal profit from each customer transaction.

Purchase Frequency = Money spent by each customer transaction X 0.10

Churn Rate can be deduced based on the number of transactions from the provided period.

C.R = 1- R.R.

Where R.R = Repeat rate

Purchase frequency is calculated based on repetitive purchases made by the customer. It is calculated if customer repeats more than once only. Part of calculations were obtained from the work made by (Avinash Navalni, 2018).

Once the target feature were obtained from the dataset using ablove calculations.Next, we need to train and evaluate the effective machine learning algorithms based on the Mean Squared Error.Since, Means Squared error is the most common and popular evaluating technique for regression based algorithms.

Process and stages in the current project

Figure 1

The typical stages involved in the current data analytics project were shown in above chart,

In our case,here,we want to predict the customer lifetime value for a United Kingdom based online retail company. Given a transactional data of the customers our objective is to predict the customers to what extent they are going to use their service.

Data Gathering: we had transactional data, which is readily is readily available from the UCI machine Learning repository. The data contains of all the transaction information of the consumers between 2010 and 2011 (UCI Machine learning repository).

Data Cleaning: As soon as we gathered the data for analysis, it is important to clean the data. 80% of data analytics project work goes into the data cleaning (Karen Grace-Martin,2015). Though it is more time consuming process, the outcomes of the project were highly dependent on the quality of the data. Thus data cleansing process ensures in retaining the quality in data before building the model. Initially, the paper is focussed on the United kingdom based customers only, there exists data for customers from other countries such as Germany,Iran,Spain and Belgium. Around 98% were on U.K customers rest 2% from other countries which needs to be filtered before going doing analysis on the dataset.

The dataset has 135,080 missing values which is nothing but nearly 25% data were missing. We need to filter those data from the dataset.

Next a brief statistical summary analysis need to be performed,if there exists any negative quantity value, those values also need to be eliminated from the dataset.

Feature Engineering: This is the key step in getting desired results for any machine learning based problem. Many beginners and even mid-level experts had an assumption that good machine learning gives accurate results, that is partly true, feature engineering is the predominant step in getting accurate results. Typical Feature Engineering steps were, Standardizing,Normalizing the variables and, applying encoding and binning techniques on the data in converting variables from categorical to numerical form and numerical to categorical form. In our case, we need to create new feature such as Average Order value which can be calculated by multiplying existing feature values i.e., Quantity X Price per unit, and also there is no target feature, which can be computed using CLTV formula as mentioned above.

Model Building: Predicting the future, will be defined at this stage. At this phase, Machine Learning comes into role. Typically machine learning is a program or system that trains on the input data.The trained system or model tries to make predictions on the data that is obtained from the same distribution. Set of regression models such as Linear Regression,Decision Tree, Random forest, Gradient boosting were applied in forecasting the value. Ensemble Models are expected to perform better compared to traditional models,because they have the ability in optimizing the bias and variance in the data.

Model Evaluation:

In a predictive data analytics project,our goal is to make predictions on the unseen data. Once model building is done, the next important phase is to evaluate the model. Model Evaluation gives the proper understanding over performance of the model. The typical model evaluation techniques were, in our case, we are trying to predict the continuous variable. Which is a regression problem i.e customer lifetime value, is Mean Squared Error( MSE).

In Statistics, Mean Squared Error is a procedure for estimating an unobserved quantity. It is an average squared difference between the estimated values and what is estimated. It is also defined as corresponding risk function to the expected value over the squared loss (Robert Tibshirani; Trevor Hastie.,2013). In general, the models that gives low scored error values on the unseen data are treated as good models.

Feedback loop:

In a typical machine learning based problems, the feedback loop plays a crucial role. If the output results were not upto the expectation, then we should focus on feature Engineering instead of directly changing the models.With the given dataset initially predictions were made, if the error is huge,let’s say we used random forest,in that tuning parameters such as number of trees and sample size needs to be tuned in order to get desired output.

The below mentioned supervised algorithms are used in getting good accurate results with proper tuning parameters.

Machine Learning Terminologies and Techniques:

There are two types of machine learning algorithms one is Supervised and other is Unsupervised.

Supervised algorithms: In Supervised algorithms where data is of labeled one’s i.e., we have the target variable, train the model on the corresponding input data with respect to the target variable and make predictions on the unseen data.In this project, we are using supervised algorithms.

Unsupervised algorithms: In unsupervised approach, we don’t have the target variable we train our model and wants the algorithm to learn from the data itself.

In supervised algorithms are further classified into two types, they are regression and classification.

Regression are used to quantify the strength of relation relationship between one variable and the variables that are thought to explain in it.

In Classification, objective is to train the model in predicting the discrete values i.e categorical Target values. In our problem Initially we are trying to study set of machine learning algorithms subjected to our case. Our’s is an regression problem, we are trying to predict the continuous value.

Linear Regression: Linear Regression is the first and foremost predictive algorithm. There are many types of regression techniques. For, intuitive understanding let’s see the simple Linear regression. This is a model that can show the relationship between two variables. More specifically it shows the variation in the dependent variables can be captured by the change in the independent variables. In the business context the dependent variable can also be called as predictor variable or sales of a product, performance, pricing or risk etc.The independent Variables also called as explanatory variables, explains the influence of the dependent variable. A Simple Linear regression model is linear because all the terms in the model are either constant value or the parameter that is multiplied by the independent variable. The core idea is to find out the model.Which, when plotted is the line of best fit for the data. (Robert Tibshirani,Trevor Hastie .,2013)

Decision Trees: Decision trees are non-parametric techniques which forms a tree like structure.It uses three types of mathematical formulas in selecting the nodes of a tree Gini Index , Entropy and Enthalpy.

Figure 2

Generally Entropy defines purity of a feature with respect to target variable. Root node of a decision tree is formed by picking up feature with less entropy.

But,picking up feature with respect entropy contribute less power to model. Hence, Gini Index powers the model in understanding true contribution of feature by inculcating information gain.

Information Gain is calculated by, below formula.

Artificial Neural Networks:

These are one of the supervised algorithms which helps in solving of both regression and classification problems. Neural Networks algorithms were inspired from human brain functionality. In fig.1, consists simple artificial Neural Network with one input ,output layer and hidden layer. Number of neurons on input layer is equal to number of input features in the data.Hidden layers and number of neurons were the tuning parameters of the model. The given input values gets computed by applying linear summation and multiplying with weights and bias.The output values passes into activation function results outcome of the instance. (Tom Mitchell.,1997)

Figure 3

Random Forest:

Random Forest are popular ensemble technique. The major challenge for any machine learning problem is to maintain low bias and low-variance. Random Forest is a supervised Machine Learning algorithm which is built by picking up features randomly from the data and builts decision trees in making predictions on classification and regression problems.

The main tuning parameters with Random Forest were n_estimators and max_depth. Since a forest is built on picking up features from the data. It is important to set how many features, it should pick randomly and how deep it should build the tree. A deep decision trees are prone to overfitting problem.

Gradient Boosting:

Boosting is an ensemble technique, it selects the predictors sequentially rather than independently. This technique helps us in solving both regression and classification problems. Boosting techniques generally makes weak learning model turn into strong learners. Main idea behind this algorithm is to focus more on residuals and modify the model to predict residuals of the previous models. In the end, all the predictors are combined by giving some weights to each predictor.

Parameter tuning in gradient boosting algorithm can be done in two different levels,

● Tree based parameters:

o Min_samples_split – minimum no.of samples required in a node to split

o Min_samples_leaf – minimum samples required in leaf node

o Max_depth – max depth of a tree

o Max_features – no of features to be considered while searching for best split.

● Boosting Parameters:

o Learning_rate – determines the impact of each tree in the output

o N_estimators – no of sequential trees to be modeled

o Subsample – fraction of observation to be selected for each tree

● Miscellaneous Parameters:

o Random_state – random number seed

o Warm_start – to add additional trees to the previous fit of the model

Anticipated Results:

Describe the anticipated results of the study.

This study gives complete practical working functionalities of machine learning models on the transactional data. It helps how important domain knowledge is, especially when designing target feature from the independent variables. As the study proceeds, picking up right model on the basis of interpretability and accuracy were expected from this study. Ensemble Models are anticipated in getting accurate predictions,but traditional models had good potential in terms of interpretability. These techniques, helps the business in developing campaign programs based on the profitability of consumers to firm.

Once the forecasting were made the output will be sorted in descending order. So, arranging in descending helps business to see value from maximum to minimum. Later, segmentation can be applied based on the setting up threshold, which can be derived from classification models such as logistic regression. Later, in depth research study can be done on the segmented data for the business.

Below figure is an sample illustration over the upcoming work that will be done on the data.

Detail the impact and improvement results of the study.

The previous studies were attempted individually in using machine learning in forecasting customers lifetime value, and some are done on the basis of theoretical assumptions, and few are chosen statistical, economic models in forecasting without justification on the data. This study helps business in picking up the right model on right data. It shows the business how they can design the marketing programs over the specific customers in order to prolong their value to the customers with in depth analysis.

Highlight the use of visualizations tools if any:

Initially for data exploration this study needs visualizations such as bar charts, scatter plot and heat maps for performing Univariate, Bivariate Analysis and more than three variable analysis. Apart that time series graph is expected in this study.

Discussions:

This study basically aims at business to get good understanding over the possibilities of Machine Learning. Typically before applying any machine learning algorithm in solving a problem. It is important to know Upside and downsides of these algorithms. We can’t directly apply any technique on the problem right away. It is very important to have a good understanding over the domain and the data that we are going to work on.

Through this study, we are trying to predict the customer Lifetime value. We don’t have that target value. In order to know how to compute the value itself, shows the role of domain knowledge in tackling the problem. Having domain knowledge is my first learning.

Secondly, we listed an array of machine learning algorithms, and our initial problem is set on regression. But the algorithms such as Neural Network can’t work effectively because, those are data hungry needs large amount of data. (imp of data over technique)

Third, as initially we are helping the business in designing marketing campaigns, this research work initially looks profitability aspect based on customer lifetime value. Apart from that, it also needs other elements, because profit is one part of the game.

Main insights that can be gained from this study were, it provides complete understanding on machine learning capabilities in working on less data. How important domain knowledge in our problem are major insights from this study.

The common tools that can be used are Microsoft Excel, Python,Pandas,Numpy, Scikit-learn and Google Colab. For visualization it is advisable to go for Tableau, Matplotlib and ggplot2.

Key Challenges

Nearly one quarter of the data was missing.Though we had the central tendency techniques such as Mean,Median,Mode.But the data set is missing Unique Id, which can’t be replaced by the above techniques.

Very few features are available.Since Further deep analysis are essential for this project majority of the features need to be obtained in a derived form.

Recommendations & Future Study

In coming semesters, this project focus on justification, over why ensemble models performs good in predicting continuous value will be studied and how to do customer segmentation in terms of profitability using machine learning will be done. As the data was limited due to heavy missing values. Techniques such as Monte-carlo Simulation and probabilistic based simulations can be applied in overcoming that problem. As the project were based on the batch learning, An online machine learning based techniques can be applied in order to overcome the human intervention in timely manner will be applied.

References

Su-Yeon Kim a , Tae-Soo Jung b , Eui-Ho Suh c , Hyun-Seok Hwang (2006),Title: Customer segmentation and strategy development based on customer lifetime value: A case study.

Lars Kotthoff a,∗ , Ian P. Gent a and Ian Miguel (2010), Title: An Evaluation of Machine Learning in Algorithm Selection for Search Problems.

Daqing Chen,Sai Laing Sain, Kun Guo(2012), Title: Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining.

Sanjukta Bhowmick, Victor Eijkhout, Yoav Freund, Erika Fuentes, and David Keyes. Application of machine learning in selecting sparse linear solvers. Technical report, Columbia University, 2006.

Robert Tibshirani Trevor Hastie “Introduction to Statistical Learning”: Textbook originally published in 2013.

Tom.M.Mitchell “Machine Learning”: Textbook originally published in 1997.

Dwyer, F. R. (1997). Customer lifetime valuation to support marketing decision making. Journal of Interactive Marketing, 11(4), 6–13.

Benjamin Paul Chamberlain et al (2017),Title: Customer Lifetime Value Prediction Using Embeddings.Retrieved from: https://arxiv.org/pdf/1703.02596.pdf

Jean-leah Njoroge.,(2017). Title:Significance of Exploratory Data Analysis,Retrieved from: http://www.jeannjoroge.com/significance-of-exploratory-data-anaysis/

Abbot Dean.,(2012).Title: Data Mining and Predictive Analytics, Retrieved from: http://abbottanalytics.blogspot.com/2012/04/why-defining-target-variable-in.html

NAN L. MAXWELL et.,al (2015) Title: Data and Decision Making: Same Organizations different perceptions. Retrieved from: https://redf.org/app/uploads/2015/02/Data_Decision_Making_WP.pdf

Richard Koch (2008), Title: The-8020-principle. Extracted from https://richardkoch.net/2012/11/the-8020-principle-2/

Ecommerce news (2018), Title: 87% of UK retail purchases made online Retrieved from https://ecommercenews.eu/87-of-uk-retail-purchases-made-online/

UCI Machine Learning Repository, Retrieved from: https://archive.ics.uci.edu/ml/datasets/online+retail

Avinash Navlani(2018),Title Customer Lifetime Value, Retrieved from: https://www.datacamp.com/community/tutorials/customer-life-time-value

Kare Grace- Martin(2015) Title: Preparing Data for Analysis is (more than) Half the Battle. Retrieved from: https://www.theanalysisfactor.com/preparing-data-analysis/

Courtney Rega, (2019) Title: running retail stores more expensive than online. Retrieved from :https://www.cnbc.com/2017/04/19/think-running-retail-stores-is-more-expensive-than-selling-online-think-again.html

Turn in your highest-quality paper
Get a qualified writer to help you with

“ An Empirical Machine Learning Approach in Forecasting Customer Lifetime Value ”

Get high-quality paper

NEW! AI matching with writer