AWS for M&E Blog
How to set up a resilient end-to-end live workflow using AWS Elemental products and services: Part 4
In this series:
- Part 1: Single-region reference architecture deployment walkthrough: The Fundamentals
- Part 2: Single-region reference architecture deployment walkthrough: Advanced Workflows
- Part 3: Multi-regions reference architecture deployment walkthrough: Advanced Workflows
- Part 4: High-Availability Advanced Workflows with Automatic Failover (this post)
High-Availability Advanced Workflows with Automatic Failover
The first three parts of this blog series described best practices for building resilient, end-to-end media processing workflows using AWS Elemental Live, AWS Elemental MediaLive, and AWS Elemental MediaPackage combined with AWS Elemental Conductor and AWS Elemental MediaConnect.
The first two posts (Part 1 and Part 2) focused on workflows that leverage multiple Availability Zones within the AWS Region in which they are deployed. The third post covered situations in which customers might want to add another layer of resilience to their live video workflows by deploying AWS Media Services in multiple regions with failover spanning across regions.
In each of these progressively more resilient architectures, live video is processed by MediaLive, utilizing redundant encoding pipelines with redundant outputs to feed redundant inputs in MediaPackage that have automatic failover from one path to another to maintain the availability of the live video stream.
This post expands on automatic failover for incoming source video signals using AWS Media Services, further enhancing the resiliency of an end-to-end video workflow with AWS.
Resilient input and transport with AWS Elemental MediaConnect
AWS Elemental MediaConnect is a reliable, secure, and flexible transport service for live video that enables broadcasters and content owners to build live video workflows and securely share live content with partners and customers.
MediaConnect supports redundant inputs and outputs, ensuring resiliency using primary and secondary flows as described in part two of this blog series. MediaConnect now offers extended resiliency by enabling automatic failover between two live sources into a flow. If one source fails, MediaConnect can failover to the second source, providing uninterrupted ingest and transport of a live video stream by adding resilience and automatic failover. The SMPTE-2022-7 standard (seamless protection switching at the packet level) is used if the two sources are compliant; otherwise, active/active failover is used.
Automatic source failover is a setup that involves two redundant sources for a flow. Additionally, MediaConnect must receive content from each source at the same time. When you enable source failover and specify two sources for a flow, MediaConnect treats both sources as the primary. Neither source is considered a backup of the other. The service uses the two sources for failover activity based on SMPTE 2022-7 compliance on the sources.
More about SMPTE 2022-7
SMPTE 2022-7 is a standard developed by the Society of Motion Picture and Television Engineers (SMPTE) group. The SMPTE 2022-7 standard defines a method that replaces missing packets with packets in an identical, redundant stream. This type of failover requires a small latency buffer in your workflow to allow time for MediaConnect to recover packets from the two streams.
MediaConnect uses two types of failover method behind the scenes:
- If the two sources are SMPTE 2022-7 compliant, MediaConnect uses content from both sources. The service randomly selects which source to start with. If that source is missing a packet, the service pulls the missing packet from the other source. For example, if the flow is using source A and packet 123 is missing, MediaConnect pulls in packet 123 from source B and continues using source A.
- If the sources are not SMPTE 2022-7 compliant, MediaConnect randomly uses one of the sources to provide content for the flow. If that source fails, the service switches to the other source. The service continues switching back and forth between sources as needed. This setup is sometimes referred to as active/active or hot/hot.
Resilient live video ingest and processing with AWS Elemental MediaLive
AWS Elemental MediaLive is a broadcast-grade live video processing service for creating high-quality live video streams for delivery to broadcast televisions and multiscreen devices.
MediaLive supports multiple inputs, with a maximum of two push live inputs to create one primary ingest endpoint and another two push live inputs to a secondary ingest endpoint. These endpoints serve as sources for the primary and secondary encoding pipelines of a standard channel. Like MediaConnect, MediaLive also supports extended resiliency with automatic failover between two live push sources – from MediaConnect, other UDP/RTP transport stream sources, or RTMP sources – used as an input for an encoding pipeline.
This automatic, intelligent failover mechanism enables additional redundancy for MediaLive inputs and provides a higher level of resilience for live video streams. After a failover, you can select whether to return to the primary source when it is in a healthy state again or continue using the secondary source for the input. Alerts and logs are provided to help you identify actions taken and review status.
When you set up the inputs for a channel, you can set up two push inputs as an input failover pair Setting up this way provides resiliency in case of a failure either in the upstream system or between the upstream system and the channel. Bear in mind that both inputs in the failover pair must be the same input type. For example, they must both be RTMP inputs. Also, the two inputs in the pair must contain identical content and characteristics: the video, audio, and captions.
The input pair provides content to the same pipeline in the channel. One of the inputs is the active input and one is on standby. MediaLive ingests both inputs, in order to always be ready to switch, but it usually discards the standby input immediately. If the active input fails, MediaLive immediately fails over and starts processing from the standby input, instead of discarding it.
Automatic input failover and pipeline redundancy
You can implement both automatic input failover and pipeline redundancy (standard channel). If you implement both features, the source requirements are different: You need four sources from the upstream system – two for each input and two for each pipeline.
Implementing both features provides more resiliency:
- With automatic input failover, when an input fails (or the Availability Zone fails), only one flow to the pipeline fails. MediaLive switches to the other flow.
- With a standard channel, when a pipeline fails, output continues on the other pipeline.