On-premise Data Center to AWS Migration – Experience & Learnings
This is particularly more important for a business like RevX that intelligently targets users based on the user data and insights. We started out with a privately hosted cloud back in 2011-12 when elastic computing was not very prevalent. This blog talks about how we optimized our hosting infrastructure from both cost and performance perspective by migrating to a public cloud.
Initial State: On-Premise Datacenter
RevX had its humble beginning in 2011-12 with a privately hosted cloud in an on premise physical data center. We did not have the flexibility to optimize hardware costs efficiently in an environment when incoming traffic fluctuates. We needed to plan our hardware well in advance and could not scale up or down as per changing workloads. As we added more supply partners and expanded our business into the new markets, the incoming traffic grew exponentially to the tune of 15-20B requests per day. We needed a cloud infrastructure that could scale up or down based on the load. We chose Amazon AWS because of their global presence and expertise in this domain.
Phase I: Proof of Concept
RevX operates with 4 datacenters across the globe. We took an incremental approach wherein initially we migrated one data center to AWS as proof of concept. The initial results were promising. Once we stabilized multiple services and were happy with the performance, we moved ahead full steam to migrate all infrastructure to AWS in a phased manner. In a little over 3 months, we had completely moved out of our old data center and migrated over 20 different services while continuing to run the business seamlessly without any interruptions.
Phase II: Auto Scaling
We saw significant savings just by using a cloud-based setup. Our next focus was to tweak every single service and drive the most out of each one of them. AWS allows very fine-grained control over the hardware specs of virtual machines. We rigorously optimized instance types to match the hardware need of each service and fine-tuned cost of hosting these services.
As an RTB marketing platform, RevX listens to a firehose of bid requests that varies significantly in volume over the course of a day. Hence, it made sense to dynamically scale our bid listener cluster to have just enough nodes to handle the incoming traffic. We leveraged the auto-scaling implementation provided by AWS to automatically scale the cluster up or down based on triggered alarms. The auto-scaling policies had to be fine-tuned iteratively to avoid instance flapping and over-provisioning.
Phase III: Spot Instances
A logical next step was for us to use spot instances instead of on-demand instances. Spot instances are spare instances that AWS auctions off at significantly lower prices. However, the price of spot instances varies continuously and can become higher than the price we had bid resulting in the termination of spot instances. To handle such scenarios, we built an intelligent fallback logic to fulfill scaling needs via on-demand instances when spot instances weren’t available. Spot instance also requires the components running on it to be stateless as you can lose the instance anytime. So we had to re-architect the bid listener to be stateless.
All these infrastructure optimizations have enabled us to integrate more and more inventory exchanges into the platform without increasing the tech hosting costs resulting in almost 37X reduction in cost/bid.
We continue to look into more opportunities including rigorous data archival policies, intelligent spot instance bidding and dockerisation.
If you are an established platform hosted on the private cloud and looking to migrate to public cloud such as AWS, these pointers could help in seamless transition.
1. While selecting a service region, parameters like cost, latency and availability of services should be analyzed closely. For example, AWS’s Singapore DC cost is about 20% higher as compared to US east DC. Region agnostic components can be moved to region with lower cost.
2. Migrate in a phased manner; focus on completing the migration successfully and stabilizing individual components and services before optimization.
3. Closely monitor the resource utilization of your components so that you can optimize on instance type and sizing.
4. Auto scaling of your most expensive service clusters, usage of reserved and spot instances will drive the most value towards cost reduction.
5. Explore the full suite of products/services offered by your cloud infrastructure provider and keep innovating!