Learning from Ad Tech Engineering

Age of Ad Networks

The platform started out with the vision to be the technology stack that would power each ad network businesses with its own set of publishers and advertisers. The goal was to maximize media yield for the publishers while providing precise targeting and performance optimization (CPC or CPA) for advertisers.

RevX engineers made sure we had a modular service-oriented architecture (SOA) in place, each with its own specialized work and ensured fault tolerance was baked right in (i.e. platform continues to serve ads within budget limits even if the reporting pipeline or the database goes down).

Our core ad serving stack was written in C and blazingly fast so we could serve back ads on publisher websites in matter of milliseconds with all campaign data loaded in memory.

We used Hadoop for crunching raw logs and stored the processed data in MySQL for ad hoc queries which worked reasonably well at that scale.

hadoop data processing flowchart at revx


Dawn of RTB

Around 2009, RTB (real time bidding) emerged as a way to buy and sell online ad impressions in real time. RTB allows media buyers to participate in real-time online auctions by placing a bid and the highest bidder gets to show the ad. This opened up a whole new realm of challenges and opportunities and we grabbed them to become the first RTB-enabled ad platform in APAC.

We built a bidder that could listen to this firehose of bid requests coming in from integrated RTB sources and generate a reasonable bid for each ad opportunity. The bidder was architected to listen to billions of requests per day, scale linearly with more CPU cores and return back with a response in less than 5 milliseconds. Whoa!

With abundance of new data, we needed to upgrade our reporting infrastructure to Vertica (we later moved to RedShift) for ad hoc queries.

 vertica database reporting flowchart at revx from server

Remarketing + Ad Personalization

Remarketing lets advertisers show ads to users who have previously visited their website or used their mobile app. Some reports say that only 5% of a website’s users actually complete their purchase. Remarketing helps reconnect with rest of those users by showing them ads while they browse other websites or use other mobile apps.Personalizing the ads based on the user’s behavior on the website or app is essential.

The great promise of RTB is to cherry pick ad opportunities from a nearly infinite pool which makes remarketing at scale feasible in the first place.

In order to build remarketing, we needed the ability to collect and remember which users visited the advertiser website/app and what did they do there. Next, make this available to our bidder so fast that it responds back in less than 5 milliseconds and repeat the same do for hundreds of millions of users. Wait, what? Yep!

The way to do that is to use a crazy fast user data store that allows you to read/write data in less than 1 milliseconds nearly every single time and that can support billions of queries a day. You also need to ship this data in real time to a replica sitting on every data center across the globe. In the RTB world, your data center needs to be located close to SSP partner’s DC to be able to meet stringent response time which is generally < 100 milliseconds.

No two users are alike, we know that. So, it is super essential to separate the wheat from the chaff or as the French would say séparer le bon grain de l’ivraie. Remember RTB is an auction, so you need deep predictive analysis to compute the appropriate bid price for each user to maximize the chances of you winning auctions that you most care about. The bid computation takes place several hundred times for each bid request, so this component needs to be the fastest thing you have in your tech stack.

A typical user browses several products over the course of her interaction with the advertiser website/app. A good remarketing stack has the ability to pick the right products to show to each user from the entire product catalog, easily several millions for most online commerce advertisers. A heuristic approach could be a decent start, but you will need to plug in a more scientific approach to selecting and recommending products. This continues to be an ongoing area of research at RevX.

Advertisers also want the ability to target a precise set of users, for e.g., “users who have browsed at least 10 products in the last seven days and have previously made a purchase”. We have a user segmentation pipeline that can build and update audiences in real time based on event data collected from the advertiser’s website or mobile app.

 data processing flowchart of remarketing and ad personalization at revx

Mobile Revolution

With increasing smartphone penetration, we see more and more interactions and transactions happening over mobile devices especially on mobile applications. Mobile remarketing is now a key marketing channel for app developers for re-activating uninstalled or dormant users as well as driving re-engagement and transactions on the app.

The mobile app eco-system adds new partner stacks, specifically MMPs (mobile measurement partner) like ApsalarAdjust, AppsFlyer, etc. These partners help advertisers collect data, forward it to marketing partners like us and do conversion attribution. Integration with these partners is a key requirement for app remarketing with each of them having their own idiosyncracies and nuances created because of custom integrations done by each advertiser.

The mobile app environment also introduces new creative formats like native ads with each supply partner having its own requirements, for e.g., image sizes. We have built an image resizer component that automatically generates the required sizes for all active products in the advertiser catalog. The re-sizer can also generate additional transformations like putting an overlay with the discount or price on top of the image. Today, RevX re-sizes and transform 10 million+ images per day.

Advertisers can also upload a list of advertising IDs that they would like to target which we store as a targetable audience in our system and match these against the bid requests we receive. Advertisers can also host these lists on their servers and configure a regular crawl of these lists from our system for a truly no hands - always on campaign setup.

Most mobile app inventory sources also lack the ability to configure receiving bids only for a pre-defined list of users (a norm in the website world). This means either you listen to the entire firehose of bids or miss out on ad opportunities by limiting QPS. To tackle this, we make sure the unwanted bids are filtered out at the very first layer of our bidder. We also auto-scale this first layer so that we have only the most optimum number of machines running at any point in time while still being able to listen to the entire firehose.

data processing flowchart of mobile remarketing at revx

Key Takeaways

RevX platform has come a long way - from a plain, vanilla ad server to a sophisticated performance marketing platform that processes billions of ad requests, millions of products and hundreds of millions of user data requests per day in real time. Technology stack works hard to filter out unwanted bids to optimize hardware cost, find the matching campaigns based on the comprehensive targeting options (audience, geo, placement, frequency capping, budget, channel) and sends thousands of campaign, ad template and product combinations to the model to request the probability of click or conversion and compute the right bid price - not too high so we don’t overpay and not too low so we don’t lose. All this magic happens in less than 5 milliseconds, billions of time per day at the hardware cost that helps us run business.  The platform has evolved to this level over last 5 years, and we have learned a great deal along the way.

If you are looking to build your own ad tech, it’s a long journey.  It’s easy to build a basic ad server (the 1st generation RevX) and nowadays open server technology like OpenX exist. This can certainly help you do basic things if you are selling your media at CPM. All you need to do is select the highest CPM campaign and serve simple, static ad. But if your goal is to build a technology that can deliver true performance to marketers and optimize your own yield, go through build vs partner decision before putting a team of few engineers to tackle the problem.  Evaluate if you can work with an ad technology partner before you commit to building, maintaining and growing the stack in-house. As the business evolves, the architecture keeps increasing in complexity; cost of operations and maintenance burgeons out of hand pretty quickly.

If you absolutely must create your own, keep these key things in mind:

1.       Start with a good service-oriented architecture that will stand the test of time and changing product requirements.

2.       Identify key components that will play a critical role in scaling your business without piling up costs proportionally, optimize these to the last byte.

3.       Don’t compulsively build everything in-house, use third party services whenever it makes cost sense through their economies of scale.

4.       Learn & adopt new technologies that solve your problem in the most efficient manner.

5.       Invest in data science and analytics; these will keep you ahead of your competition when everything else becomes commodity. Keep hacking! J

Sandip Acharyya