In the first post on this topic, we defined customer lifetime value (CLV) and learned about customer lifetime value models. How to use machine learning in predictive CLV models (Customer Lifetime Value Prediction Model).
In this post, we'll learn how to deploy machine learning to predict CLV. Customer Lifetime Value CLV is a customer's past value plus their expected future value. Predictive CLV is the calculation of how much value a user will bring to the business in the future.
Machine learning (ML), a subset of AI, combines algorithms and statistics to do a specific job without human supervision. It does so by finding patterns inside the big data sets. ML is a valuable tool today for predicting CLV (Customer Lifetime Value).
Predictive CLV (Customer Lifetime Value) aims to model buyers' purchasing behavior to infer what their future actions will be.
For this, ML models are a suitable alternative to probabilistic models because they can use more features. Probabilistic models are a class of models in which parameters are fitted to the data by training the model using theoretical gradient descent.
So, when you're predicting future CLV, you are tackling two distinct problems:
- You need to forecast the future value of existing customers with previous transaction records
- You also need to predict the future value of new customers
Knowing your customers inside and out is essential in today's customer-centric market. Speak to Our Experts to get a lowdown on how the customer lifetime value model can help you.
So, how do you solve both of these problems? Several modeling techniques can be deployed. One way is to build predictive CLV models in Python or R. You can also use deep neural network models.
Steps in building an ML model for predicting customer Lifetime Value Model:
Building an ML model to predict your customers' lifetime value involves specific key steps:
- Collect & clean customer data: First, ensure you have clean data for each customer. You must have a customer ID that's used to differentiate individual customers and a purchase amount for each purchase that each customer has made. No matter which model you use, you must perform a set of standard preparation and cleaning steps.
- Build a model: Next, you will need to implement ML algorithms to search your dataset for patterns. Once you have a list of patterns, you can design steps to analyze and understand those patterns. To prepare the models for training, you must choose a threshold date.
- Check whether the model is successful: After all the previous steps, it is essential to verify that the model is working correctly.
Our data analyst, Mehar Singh Gambhir, shows you how to build an ML model for predicting CLV.
Steps in building an ML model for predicting CLV
The following steps are involved in the building of a machine learning model to predict customer lifetime value (CLV):
1. Clean And Prepare Data:
The first step is to prepare the data set and select the variables to use as features for training the model. Before performing any processing or analysis on the data, some basic data cleaning operations need to be performed on the data set. These include:
- Not using columns with low variance
- Filling or eliminating rows containing null values
- Getting rid of duplicate records to avoid redundancy
For this model, we use a sample data set consisting of standard transactional data for a company, where each row corresponds to a purchase made by a customer. The data set consists of fields such as the unique customer ID, transaction date, product quantity, and total purchase price.
Here is a sample of the initial raw data set we will be starting our CLTV model building with:
These are raw features for a basic model, but one can incorporate additional features, such as browsing data, including browsing time, additions to carts, and product searches. Also, email response data from marketing campaigns or promotions can be added to the feature set. Some customer metadata can also help predict lifetime value.
We split the data into training and test sets using a threshold date (here taken as 2019-08-09). A training period of 1681 days is considered to have occurred between 2015-01-01 and 2019-08-09. The test period will run from 2020-08-09, the period for which we will predict the lifetime value of a customer.
We currently have only two variables, not per transaction, which won't be enough to predict customer lifetime value. We apply feature engineering to the raw data to extract insightful features for this purpose.
First, we group the rows by customer ID and transaction day, and aggregate with sum() to evaluate each customer's daily transactions.
Now, for each customer ID, we calculate the following values:
- Recency: The active duration during which a customer made purchases in that time period
- Monetary_value: Total amount of all purchases made by the customer in the given time period
- Frequency: Number of purchases made by a customer
- T: The number of days between the first transaction date of a customer and the last day of the training period (in this case, 2019-08-09).
- time_between: The average number of days between successive transactions of a customer
- avg_basket_value: The average cost of purchases made by the customer
- avg_basket_size: The average quantity of products purchased
- Target_monetary: This is the target variable, calculated as the customer's monetary value over the 1-year test period.
2. Pre-processing:
As we can see from the description of the input variables above, the difference between the min and max values is considerable.
The values for some variables, such as monetary_value and average_basket_value, are highly skewed, which could affect model training. We have applied some pre-processing to the data to normalize it.
We initially apply a log transformation to some columns with high range values, such as monetary_value, time_between, avg_basket_value & avg_basket_size, to bring them down.
Then I subtracted each row's values from their means and divided the differences by the standard deviation to normalize the data across all columns.
Data normalization reduces the range of values the model trains on, thereby speeding up training and improving performance.
3. The Two-Stage Pipeline Approach:
Most CLTV models use the conventional approach of predicting a customer's CLTV from RFM variables.
However, a customer might have churned (stopped doing business with the company) during the training period. Yet, a finite CLTV could still be forecast for that customer in the test period.
Ready to move beyond basic metrics? Learn how predictive CLV modeling works for your business >>>> Request a consultation
A drawback of this approach is that it cannot consider customer churn before training. Hence, in this model, we are going to follow a 2-stage pipeline approach, which includes:
1. Classification model for predicting customer churn:
Customers with monetary value <= 0 in the test period are expected to be the customers who churned in the training period.
So we label customers who churn as False and the others as "True". We train a classification model on 80% of the data as the training set using deep learning, with the features described above, to predict whether the customer churned during that period.
We trained a classification model with ~83% accuracy and an AUC of 0.737.
2. Regression model for forecasting lifetime value:
We train a deep learning regression model on customers who did not churn in the training set to predict lifetime values for next year.
Using this model, we predict the target monetary value for customers in the test set. The following parameters were used for training the regression model.
We then combine the two model outputs using the classification model's probability and the regression model's monetary value.
The final_amt is calculated as the product of the predicted probability and the target_monetary value, which will be considered the customer's lifetime value for next year.
The mean absolute error of this model is approximately $103.
Key Considerations for ML Model Development
Some things to bear in mind before building an ML model:
- You will need to define the time frame for calculating the CLV. It can vary by industry.
- Identify what type of machine learning problem
- Create training and test datasets. You will use the training set to build the model.
Advanced CLV Modeling Techniques
RFM Analysis Integration
RFM (Recency, Frequency, Monetary) analysis can be integrated with machine learning models to improve CLV predictions:
- Recency: How recently a customer made a purchase
- Frequency: How often a customer makes purchases
- Monetary: How much money a customer spends
Cohort Analysis
Cohort analysis helps identify customer segments with similar behaviors and predict their lifetime value patterns.
Survival Analysis
Survival analysis techniques can be used to predict customer churn and retention, which directly impacts CLV calculations.
Model Evaluation and Validation
Performance Metrics
When evaluating CLV models, consider these metrics:
- Mean Absolute Error (MAE): Average absolute difference between predicted and actual CLV
- Root Mean Square Error (RMSE): The Square root of the average squared difference
- R-squared: Proportion of variance explained by the model
- Lift: Improvement in prediction accuracy compared to baseline
Cross-Validation
Use k-fold cross-validation to ensure your model generalizes well to unseen data.
Business Validation
Beyond statistical metrics, validate your model's business impact:
- Revenue prediction accuracy
- Customer segmentation effectiveness
- Marketing campaign ROI improvement
Implementation Best Practices
Data Quality
- Ensure data completeness and accuracy
- Handle missing values appropriately
- Validate data consistency across sources
Feature Engineering
- Create meaningful features from raw data
- Consider temporal aspects of customer behavior
- Include external factors that might influence CLV
Model Selection
- Start with simple models and gradually increase complexity
- Consider ensemble methods for better performance
- Regularly retrain models with new data
Conclusion
Once your ML model is successful, you can use the results to identify the customer categories most likely to spend money, & make them respond to your offers and discounts more frequently. These customers, with higher loyalty, are your primary marketing target. This means retailers can effectively run campaigns based on the predictive lifetime value of any given customer.
The key to successful CLV prediction lies in combining robust data preparation, thoughtful feature engineering, and appropriate model selection. By following the two-stage pipeline approach and continuously monitoring model performance, businesses can gain valuable insights into customer behavior and optimize their marketing strategies accordingly.
References:


