In an era where microservices are all the rage, Amazon Prime Video took a step back to assess their Video Quality Analysis (VQA) system’s architecture. They found a surprising solution: a monolith application. In doing so, Prime Video managed to scale their service to thousands of streams and, notably, reduced their infrastructure costs by over 90%.

Rethinking the architecture: the initial challenge

Amazon Prime Video offers thousands of live streams to customers globally. To deliver seamless content, they set up a tool to monitor every stream viewed by customers. This allowed the team to automatically identify and fix perceptual quality issues, such as block corruption or audio/video sync problems.

The existing VQA tool was not originally designed to run at a large scale. While the service expanded, the team noticed running the infrastructure at a large scale was expensive and discovered scaling bottlenecks that prevented them from monitoring thousands of streams. This prompted a reassessment of the architecture with a focus on cost and scaling bottlenecks.

Serverless architecture of Amazon Prime Video

The initial version consisted of distributed components orchestrated by AWS Step Functions. The two most expensive operations were the orchestration workflow and data transfer between distributed components.

From distributed microservices to a monolith

In the face of scaling and cost issues, the Prime Video team took a bold step: they packed all components into a single process. This significantly simplified the orchestration logic and eliminated the need for data transfer between distributed components, reducing costs and improving scalability.

The migration from microservices to a monolith brought a sea change to the overall infrastructure design. This new architecture allowed for internal data transfer within process memory, reducing the need for computationally expensive operations, such as Tier-1 calls to the Amazon Simple Storage Service (Amazon S3) bucket.

While the initial design allowed for horizontal scaling of several detectors, the new approach allowed only vertical scaling since all detectors ran within the same instance. Furthermore, to overcome the capacity limitations of a single instance, they cloned the service multiple times, each clone running a different subset of detectors. This new design, though seemingly counterintuitive, brought about a significant improvement in cost and scalability.

The outcome: reduced costs and enhanced scale

The move to a monolith application resulted in a massive reduction in infrastructure costs by over 90%. Furthermore, the capability to scale the service increased exponentially, enabling the monitoring of thousands of streams with room for further expansion. Migrating to Amazon EC2 and Amazon ECS also allowed the Prime Video team to use Amazon EC2 compute-saving plans, driving costs down even further.

The team discovered that replicating the computationally expensive media conversion process and placing it closer to the detectors was more cost-effective than running the process once and caching its outcome. This radical shift allowed Prime Video to monitor all streams viewed by customers, not just those with the highest number of viewers, resulting in even higher quality and better customer experience.

Lessons learned

This journey underscores that microservices and serverless components can work well at a large scale, but choosing between them and monolith must be made on a case-by-case basis. Sometimes, the classic approach may be the most efficient and cost-effective. The story of Amazon Prime Video’s VQA tool teaches us that sometimes, the path to innovation and scalability is the road less traveled.

Source of the article: Scaling up the Prime Video audio/video monitoring service and reducing costs by 90%

Last Update: 07/09/2024