Learnings from Building a Prediction Model for Ad Clicks
We model the problem of predicting ad click as a binary classification problem with two possible outcomes – ad is clicked or not. The model typically returns an estimated prediction of the outcome as a probability between 0 and 1 given the context in which the ad will be served.
Several analytical models can be used including Decision Trees, Random Forests, Linear Regression, Neural networks. In our case, the choice of model is guided by following requirements:
1. High cardinality categorical variables: The model should be able to handle high cardinality (1M+) categorical variables like app ID.
2. Latency of model query: Each model query should complete in less than 50 microseconds as the model is queried thousands of times for every bid request. Since our bidder servers are written in C/C++, library port for the same language must be available.
3. Incremental learning with new data: We should be able to train the model incrementally with new data as fast as possible with limited memory resources. Models must be built multiple times in a day to ensure it captures new trends/patterns.
We experimented with several options and found Logistic Regression to provide optimal accuracy while being easy to train with limited resources and with incredibly fast query times.
We present some of our key learnings while training logistic regression models and testing them offline and online.
Building logistic regression model with only base variables results in an overly simplistic model. Using polynomial interactions between base variables performs better and usually quadratic interaction (Poly2) is sufficient to produce satisfactory results. Also, to keep the model size under acceptable limits, it is advised to model the interactions between a select set of predictor variables only.
(Advertiser_A, Advertiser_B) [Poly2 interaction] (Publisher_X, Publisher_Y)
Advertiser_A_Publisher_X, Advertiser_A_Publisher_Y, Advertiser_B_Publisher_X, Advertiser_B_Publisher_Y
Advertiser = A, Publisher = X
Sigmoid ( WAdvertiser_A + WPublisher_X + WAdvertiser_A_Publisher_X )
Poly2 interactions can only model known interactions. To handle un-seen interactions, we need to use factorization machines (FM) that use latent variables to predict interactions between unknown features. Field-aware factorization machines (FFM) improve these further by using separate latent variables for each pair of predictor variables.
Regularization (Lasso & Ridge) techniques are typically used to avoid overfitting on training data. We have had moderate success with Lasso (L1) with online training, but new features that don’t have enough sample points in the delta training set may be discarded even though they may be important for prediction. However, not using L1 results in a continuous increase in model size as features are never discarded. This makes it necessary to re-build the model from scratch periodically to retain only features that are part of the training window.
Models need to be evaluated on multiple metrics to quantify their accuracy better. We use AUC and Log Loss to measure the overall health of the model but also check if the model is over/under predicting by replaying bid logs and comparing model predictions with actual outcomes. We also recommend looking at these metrics across various data slices as even though the overall model prediction might look good, the model might be over-predicting on some pockets of test data (say native ad creatives on ad exchange X) and under-predicting on other (say banner creatives of size 320x50).
Another important dimension to evaluate model performance in RTB setup is to look at volume and the average cost of clicks we can drive.
Offline vs. Online
One important test for any prediction model is how well it performs on unseen data. Offline tests are insufficient to measure this as the data is biased towards the prediction model that was used to bid in the first place. The real test is to A/B test your model in a production environment so that it can bid on inventory that was previously not being won.
Predictive models for online ad auction bidders need to work with large scale sparse data, tight requirements around query latencies and frequent model refresh cycles. These models also need to learn new patterns quickly while still giving reasonable predictions for unseen patterns. At RevX, we invest a significant portion of our R&D effort in improving existing models as well as exploring new modelling techniques.
We have built a suite of tools to train offline models and test them across multiple metrics before they are deployed to production. An internal analytics dashboard is used to continuously monitor health of production models over billions of bids every day.
RevX is an app retargeting platform that powers growth for mobile businesses through dynamic retargeting. The platform is built on integrated and transparent technology combining four key pillars - audience intelligence, programmatic media, personalized ads, and ROI optimization. Mobile marketers across verticals like e-Commerce, travel, lifestyle, hyperlocal and gaming use RevX to enhance user engagement by activating new users, converting existing users and re-activating lapsed users. If you have any doubts, queries or ideas, do reach out to us at email@example.com