What is Propensity Scoring and Why Should You be Using it?

Propensity Scoring

When it comes to marketing, reaching the right people with the right message at the right time is the ultimate goal for any data-driven business. In e-commerce, hyper-personalisation has become an expected part of the customer experience and making it easy for customers to find what they are looking for quickly is crucial to conversion.

One of the most effective ways to personalise customer experience is to forecast customer behaviour using propensity scoring. This statistical approach to data analysis predicts future actions, taking your data beyond what has happened and pushing it to what will probably happen in the future. It accounts for all variables that affect that behaviour and, when it comes to customers, this enables you to serve relevant content, offers and recommendations, based on your predictions.

How do you build a propensity model and why does it need to be dynamic?

When building a propensity model, it’s important to consider a few factors to ensure it is truly effective and works with your dataset. In marketing applications, AB testing and experimentation allow you to validate the accuracy of propensity scores and understand how to improve the model for your specific requirements.

A great propensity model should be dynamic, retraining and continuously evolving based on the feedback loop created by the data pipeline. As new data becomes available, the model needs to change to become smarter and more accurate based on the underlying trends in the data. 

A dynamic model requires a robust data pipeline to regularly ingest data, retrain, validate and deploy. For that reason, your model needs to be productionised and deliver understandable and actionable predictions into your business processes, often in real time.

Your model also needs to be scalable. Rather than building a new model for each campaign or use case, an effective model should be capable of producing large volumes of predictions and also be adaptable for similar scenarios across the business. 

How does a propensity model work? 

You don’t need to be a data scientist or mathematician to use propensity scoring but it helps to have a basic understanding of regression analysis, the core analytical process behind it.

Regression analysis is simply a predictive modelling technique that analyses the relationship between a dependent variable (e.g. average order value per customer) and independent variables, or features (e.g. product attributes). The two types of regression analysis used for propensity modelling in machine learning are linear regression and logistic regression.

Linear regression

Where the outcome is continuous which means there can be an infinite number of potential values. Technically, when your data involves more than one independent variable (feature) the model would be a multiple linear regression.

The equation used to denote the linear regression model is y=mx+c+e, where m is the slope of the line, c is an intercept, and e represents the error in the model. 

The best fit line on the chart is determined by varying the values of m and c. The error is the difference between the observed values and the predicted value. The values of m and c are selected in a way that creates the minimum error. As you can see from the chart, a simple linear regression model is susceptible to outliers and for this reason, it’s not an appropriate choice for big data volumes.

linear regression graph - propensity scoring
Linear Regression

Logistic regression

A predictive analysis algorithm and based on the concept of probability, where the outcome has a limited number of potential values. This method is used when the dependent variable is discrete (i.e. individually distinct – 0 or 1, true or false, etc.) and there is no correlation between the independent variables in the dataset. This means the target variable can only have two values, and a sigmoid curve (a mathematical function that has a characteristic ‘S’ shaped curve) denotes the relation between the target variable and the independent variable (feature).

Logit function, also referred to as log-odds, is used in logistic regression to measure the relationship between the target variable and independent variables. It estimates probabilities between 0 and 1.

logistic regression graph - propensity modelling
Logistic Regression

How to build your model: step-by-step

  1. Define your features (the independent variables): Select which variables to use as features (e.g. what products a customer bought, where a customer lives, how often they return items, etc.). The less relevant a feature is, the closer to 0 the coefficient will be. You need to carefully consider whether you want to interpret the coefficients or not.
  2. Choose your model type: Linear or logistic regression.
  3. Construct Model: A probabilistic model needs to be built based on your defined variables. The model’s probabilistic estimate that a customer will perform a certain action is called a propensity score
  4. Group Output: By forming buckets to group customers by score (e.g. 0.0-0.1 propensity, 0.1-0.2 propensity and so on) you can then compare the customers within each bucket.
  5. Experiment and validate: Use AB testing and other experimentation to validate the accuracy of your propensity scores and achieve maximum impact for your use case(s).

Do you need a data science team to use propensity scoring?

In recent years, machine learning has unlocked the potential of propensity modelling for most businesses with a data science team. But, creating an effective, scalable propensity model that includes the kind of feedback loop necessary for continuous improvement is complicated. Most CRM or marketing automation platforms will have some propensity models built-in for users but these often have shortcomings that mean the predictions they produce won’t be accurate enough to deliver real marketing ROI and uplift.

The reason for this is that most basic models will rely on a small number of features, typically limited to customer data and campaign-specific transaction history. They tend to overlook broader transaction history and activity data.

Models created by an in-house data science team might be more applicable to that specific business but they won’t necessarily be scalable or robust enough. Similar to those in a CRM, these tend to be static which means they don’t adapt to changes in the underlying data and as a result, they don’t become more accurate over time.

For businesses that don’t have a data science team, innovative tools, like our own platform product, bridge the gap and allow for the implementation of propensity models (and other ML capabilities) in a user-friendly way. 

At Datatonic, we have a wealth of experience helping companies like Pets at Home, Mulberry, MandM Direct, and TV4 deliver personalised experiences to their clients. Follow us on LinkedIn to keep up to date.

View all
View all
Partner of the Year Awards
Datatonic Wins Four 2024 Google Cloud Partner of the Year Awards
Women in Data and Analytics
Coding Confidence: Inspiring Women in Data and Analytics
Prompt Engineering
Prompt Engineering 101: Using GenAI Effectively
Generative AI