How The New York Times Uses Machine Learning To Make Its Paywall Smarter | by Rohit Supekar | NYT Open

Illustration by Mathieu Labrecque

The New York Times launched its paywall in March 2011, beginning its journey as a subscription-first news and lifestyle service. Since its inception, this “metered” access service has been designed so that nonsubscribers can read a fixed number of articles every month before encountering a paywall; this article limit is widely referred to as the “meter limit.” This strategy has proven successful in generating subscriptions while at the same time allowing for initial exploratory access to new readers. In fact, in February 2022, when The Times acquired The Athletic Media Company, The Times achieved its goal of 10 million subscriptions and set a new target of 15 million subscribers by the end of 2027. This success has been possible in part due to continuous improvements in the paywall strategy over the years. When the paywall was launched, the meter limit was the same for all users. However, as The Times has transformed into a data-driven digital company, we are now successfully using a causal machine learning model called the Dynamic Meter to set personalized meter limits and to make the paywall smarter.

Figure 1: The subscription funnel.

Our paywall strategy

The company’s paywall strategy revolves around the concept of the subscription funnel (Figure 1). At the top of the funnel are unregistered users who do not yet have an account with The Times. Once they hit the meter limit for their unregistered status, they are shown a registration wall that blocks access and asks them to make an account with us, or to log in if they already have an account. Doing this gives them access to more free content and, since their activity is now linked to their registration ID, it allows us to better understand their current appetite for Times content. This user information is valuable for any machine learning application and powers the Dynamic Meter as well. Once registered users hit their meter limit, they are served a paywall with a subscription offer. It is this moment that the Dynamic Meter model controls. The model learns from the first-party engagement data of registered users and determines the appropriate meter limit in order to optimize for one or more business K.P.I.s (Key Performance Indicators).

What does the Dynamic Meter optimize for?

The Dynamic Meter model must play a dual role. It should support our mission to help people understand the world and our business goal of acquiring subscriptions. This is done by optimizing for two metrics simultaneously: the engagement that registered users have with Times content and the number of subscriptions the paywall generates in a given time frame. These two metrics have an inherent trade-off since serving more paywalls naturally leads to more subscriptions but at the cost of article readership. This trade-off is clearly visible in the data collected by a Randomized Control Trial (R.C.T.) as shown in Figure 2. As the meter limit for registered users increases, the engagement measured by average number of page views gets larger. This is accompanied by a reduction in the conversion rate for subscriptions, largely because a lesser number of registered users encounter the paywall. Conversely, a larger amount of friction due to tighter meter limits also impacts readers’ habituation and potentially gets them less interested in our content. This in turn affects the potential to convert them as subscribers in the longer term. In essence, the Dynamic Meter must optimize for conversion and engagement while balancing a trade-off between them.

Figure 2: The trade-off between conversion and engagement as seen through a Randomized Control Trial (R.C.T.).

The Dynamic Meter is a prescriptive machine learning model.

The goal of the model is to prescribe meter limits from a limited set of options available. Thus, the model must take an action that will affect a user’s behavior and influence the outcome, such as their subscription propensity and engagement with Times content. In contrast to a predictive machine learning model, the ground truth for prescriptive problems is rarely known. That is, say a user was prescribed meter limit a, we do not know what would have happened to this user if they were prescribed a different meter limit b during the same time frame. This problem is sometimes called the “fundamental problem of causal inference” or more simply the “missing data problem.” The best we could do is to estimate what would have happened by using data from other users who were prescribed meter limit b. This highlights the importance of the data collected from the R.C.T. as it is essential to train the model.

How does the model work?

Given that we are optimizing for two objectives, namely the subscription propensity and the engagement, we train two machine learning models which we refer to as the “base-learners” (Equation 1). The structure of these base-learners is similar to a popular meta-learner model called the “S-learner.” Such a model predicts the target variable by using features X and treatment variable T. Here, the treatment variable T is a categorical variable that specifies the meter limit given to each registered user. The features are determined exclusively from first-party data about their engagement with Times content. We do not use any demographic or psychographic features in the model to avoid unfair biases against protected classes (we are committed to using machine learning at The Times in fair and responsible ways; you can find a discussion of our approach to machine learning when it comes to comment moderation here).

Using the R.C.T. data for users with features X and corresponding treatment T, we can fit two machine learning models, f and g, that predict the subscription propensity (p) and normalized engagement (e), respectively. In order to maximize both these objectives simultaneously, we convert them into a single objective s using a convex linear combination that introduces the weight factor δ which takes a value from 0 to 1 (Equation 2). It serves as a friction parameter and allows us to explicitly set the importance we wish to give subscriptions as compared to engagement. Once a certain δ is set, the prescription policy assigns a treatment to a user that maximizes the combined objective function s (Equation 3). The policy can be repeatedly applied for different values of δ giving a set of optimal solutions that form the Pareto front. The Pareto front is usually convex and contains solutions that are better than all others in at least one of the objective functions, and as we move along this front, one of the objective functions reduces while the other increases.

To pictorially illustrate this, let us consider that we set δ = 1 so that we are only optimizing for subscriptions. Using the fitted model f, for each registered user, we may predict the subscription propensity in counterfactual scenarios where they would be assigned different meter limits. The policy then assigns the meter limit that generates the highest subscription propensity (Figure 3). In essence, the model determines the right amount of free articles to be allowed for each user so that they get interested enough in The Times, and would want to subscribe to continue reading more.

Figure 3: Pictorial representation of the prescription policy that maximizes the subscription propensity alone (when δ = 1). The vertical bars indicate the subscription propensity predicted for different users in counterfactual scenarios where they are prescribed different meter limits.

How is the model (back-) tested on past data?

Before launching a model, we test it on historical data in order to estimate its performance after deployment. This is popularly known as backtesting, which involves answering the question of how the model would have performed if it had been deployed at some point in the past. In the context of the Dynamic Meter, we would like to know how the model performance would have been if it prescribed meter limits for a particular past month. Since we cannot redo the prescription in the past, we must make use of the past R.C.T. data and consider only those users for whom the model prescription matched with the R.C.T. prescription (Figure 4A). By considering whether only these users subscribed or not, we can then estimate the overall conversion rate (C.V.R.) using Inverse Probability Weighting and Hájek estimation (Equation 4). This estimate gives us the C.V.R. that we might have expected if we could indeed go back in time and set the meter limits for all users using the model. A similar estimation can be done to obtain the average page views as well.

Figure 4: Backtesting a model. (A) An example of users with their meter limits prescribed by the model and the R.C.T. in the past. The black dots indicate those users for whom the model prescription matches the R.C.T. prescription. Data from these users can be used to obtain an estimate for the overall metrics, such as conversion rate. (B) Visualization of the model-optimized solutions that form the Pareto front. The model points (orange) correspond to different values of the friction parameter δ ∈ [0, 1]. The Pareto front is convex and consists of points that are better than the R.C.T. in one of the objectives when fixing the other.

The estimation procedure can be repeated by varying the friction parameter δ, leading to a set of points that form the Pareto front (orange points in figure 4B). As δ is changed from 0 to 1, we move along the front while increasing the conversion rate and decreasing the average page views. One of these points is selected depending on the conversion rate we would like to target for the month. As a result, we obtain a lift in engagement as compared to a random policy (blue points) with the same conversion rate. In conclusion, this strategy allows us the flexibility to tune the level of friction based on our business goals and at the same time, smartly target users so that we can obtain a lift in engagement and conversion rate as compared to a purely random policy.

Rohit Supekar is a data scientist in the Algorithmic Targeting team at The New York Times, and he works on developing and deploying causal machine learning models to power The Times’s paywall. He is broadly passionate about understanding the world around us using data, building mathematically rigorous models, and deploying them using modern engineering tools. Prior to joining The Times, he obtained a Ph.D. from M.I.T. where he pursued research in applied mathematics and scientific machine learning. Outside of work, Rohit enjoys reading, long-distance running, and alpine skiing.

The author acknowledges the contributions of Heidi Jiang, Dan Ansari, Anne Bauer and Chris Wiggins to this project.

If such projects excite you, come work with us!