How OpenX Trains and serves for a million queries per second in under 15 milliseconds

Click for: original source

Adtech is an industry built on latency at scale. At OpenX this means that during peak traffic periods our exchange processes more than one million requests for ads every second, most of which require a response in under 300 milliseconds. By Larry Price, OpenX.

To accomplish this, we’ve leveraged several products in the TensorFlow ecosystem & Google Cloud including TensorFlow Extended (TFX), TF Serving, and Kubeflow Pipelines - to build a service that prioritizes traffic to our buyers (demand side platforms, or DSPs in adtech lingo) and more specifically to brands, and agencies within those DSPs.

The OpenX marketplace is not completely unlike an equities market or stock exchange. And much like high volume financial markets, to ensure the buyers fulfill their campaign goals and simultaneously help publishers monetize appropriately on their inventory, there’s a need to prioritize traffic. Fundamentally, this means we need a model that can accurately value and hence rank every single request that hits the exchange.

Further in the article:

  • About OpenX
  • Cloud Transformation: A rare opportunity
  • Why TensorFlow
  • Training Terabytes of Data Every Day
  • Serving Over a Million Queries Per Second (QPS)
  • Building on Success

We liked: All of these out of the box features in TensorFlow Serving were a massive win for us and helped us achieve our goals, but scaling it to millions of requests a second was not without challenges. By using large virtual machines with many CPUs we were able to hit our target goal of 15 millisecond predictions, but it did not scale very cost effectively and we knew we could do better. Good read!

[Read More]

Tags distributed software-architecture performance devops