AWS for M&E Blog
Ensuring media authenticity, traceability, and integrity by running C2PA on AWS
Tracking content provenance is a necessity for media companies, particularly with the advent of generative artificial intelligence (generative AI). Provenance metadata is critical for the following reasons:
- Ensure that content production and distribution is consistent with digital rights
- Track ownership so that content is stored and monetized properly
- Combat the spread of misinformation by providing proof of authenticity
- Maintain public trust by labeling content created by generative AI
The Coalition for Content Provenance and Authenticity (C2PA) has developed a standard for tracking provenance that is gaining traction in the media industry. The standard provides a mechanism for creating digitally signed manifests that can be securely attached to assets throughout the content production pipeline.
As an industry leader, Sinclair Inc. (Sinclair) is interested in continually improving its internal processes and advancing the adoption of new technologies like the C2PA. Sinclair is a diversified media company and the second-largest television station operator in the United States. It owns, operates, and provides services to 185 television stations in 86 markets. The company provides over the air (OTA) and over the top (OTT) services with NewsOn, the nation’s largest local news content provider. Its digital and OTT platforms include CHARGE!, Comet, Tennis Channel, TBD., and The Nest.
At the start of 2024, Sinclair engaged the Amazon Web Services (AWS) Worldwide Prototyping team to demonstrate how the C2PA standard could be applied to its video files. The Prototyping team helps customers adopt AWS services, expedites their path to production, and catalyzes innovation. This blog post describes the C2PA standard as well as the solution built for Sinclair, including performance and cost metrics.
C2PA overview
The C2PA standard defines a method to create digitally signed metadata manifests for videos, images, and other digital assets. Figure 1 depicts the manifest structure, which consists of assertions, claims, and a signature block. Publishers can embed manifests inside an asset or store them in a separate file, called a sidecar.
An assertion is a statement of fact about an asset. Every manifest must include an assertion that is a hash of the digital file. This so-called hard binding links the manifest to a specific version of the asset. Depending on a media company’s requirements, common assertions may include:
- Creator
- Date of creation or update
- Description
- Rights and licensing
- Ingredients, such as a list of clips used in a final video
- Identification of any generative AI models used
Assertions are encoded in Javascript Object Notation (JSON) using standard vocabularies, such as those published by the International Press Communications Council (IPTC). Companies may define their own vocabularies to support specific use cases.
The C2PA standard protects the integrity of assertions through claims and signatures. Claims are cryptographic hashes of each assertion. If a third-party alters an assertion, the hash recorded in the claim will not be valid. Content creators use a digital certificate to sign all the claims with a private key. The signature proves that the asset, claims, and assertions have not been altered.
Sinclair use case
Sinclair opted to track video metadata inside C2PA manifests so that it could produce, distribute, and monetize content in accordance with digital rights. C2PA manifests would also combat the spread of misinformation and maintain public trust by establishing the authenticity of assets and labeling those modified by generative AI.
Sinclair needed to support two different use cases: one for new videos that had no existing C2PA sidecar, and another for videos that already had a manifest. In the second use case, Sinclair needed to create a manifest that recorded the update action and referenced the original sidecar.
AWS solution for C2PA workloads
The AWS Worldwide Prototyping team’s solution allows Sinclair to generate C2PA manifests via REST APIs. Sinclair first places videos and existing C2PA manifests in an Amazon Simple Storage Service (Amazon S3) bucket. They can also store provenance metadata in Amazon S3 as a JSON file or post the JSON directly to the API. When Sinclair invokes the API, it includes pre-signed Amazon S3 URLs in the request for all the files stored in the bucket.
The AWS solution relies on an open-source C2PA command-line tool developed by the Content Authority Initiative (CAI). The tool accepts a digital file and other inputs to produce a C2PA manifest. The Prototyping team wrapped the command-line tool in a Docker container so that it could be invoked via a REST API. To deploy the container, the Prototyping team designed two different architectures: one based on AWS Fargate, the other on AWS Lambda.
The Prototyping team offered two architectural options so that Sinclair could pick the best solution for its workload. AWS Lambda functions only incur charges when they are in use, but experience a cold-start latency when first invoked. They also are limited to a maximum execution time of fifteen minutes. AWS Fargate, in contrast, does not have any time-outs or cold-start penalties. However, AWS Fargate incurs continuous charges, even when no assets are being processed.
For customers with large files or who need to process assets continuously, the AWS Fargate-based architecture may be the optimal choice. The Lambda-based architecture may be a better fit for customers who prioritize on-demand usage and cost efficiency, especially if they have small files or sporadic workloads.
AWS Fargate option
Figure 2 depicts the architecture based on AWS Fargate, which provides serverless compute for containers. The container is Python-based and operates a web-like server on the FastAPI framework. Tucked behind an internal-facing Application Load Balancer (ALB), the container is only accessible to callers within the Amazon Virtual Private Cloud (VPC) in which it was launched. For this deployment, Sinclair’s production system ran in its own VPC and called the REST API via a cross-account VPC peering connection. To sign manifests, the container retrieves a digital certificate and private key stored in AWS Secrets Manager.
AWS Lambda option
Figure 3 depicts the architecture based on AWS Lambda. This architecture leverages Lambda’s ability to run Docker containers. The launched container is identical to the one used in the Fargate-based option. The AWS Lambda Function is exposed as a REST API through the use of AWS Lambda Function URLs. This endpoint is protected by AWS Identity and Access Management (IAM) authorization. In this deployment, Sinclair’s production system assumed an AWS IAM role that had permission to invoke the AWS Lambda function. AWS Secrets Manager stores the certificate and private key to sign C2PA manifests.
Performance and cost metrics
Processing time
The following table details the time needed to download video assets, generate digital hashes, and upload manifests to Amazon S3. For these tests, we used representative video files from Sinclair that were all under 1GB in size. In one batch of tests, the manifests did not contain any metadata. In the second batch of tests, the manifests included metadata for every property defined in the IPTC Video Metadata Hub taxonomy.
Testing reveals that the download and hashing steps consumed most of the processing time. It also revealed that in this configuration, the AWS Lambda option performed the hashing processes faster than the AWS Fargate option.
Estimating cost
In our tests, the system required 15 seconds on average to process each asset. If a broadcaster were to process 10,000 assets each month, Tables 2 and 3 detail the estimated cost for the AWS Fargate and AWS Lambda options.
Disclaimer: The following estimates should be used only as guidance on cost drivers and take into consideration assumptions that may not map to your usage. Refer to the public AWS pricing of each service or the AWS Pricing Calculator for more up-to-date information about pricing.
Conclusion
C2PA gives media companies a powerful tool to track the provenance of assets and build trust and transparency. Broadcasters can use C2PA manifests to track digital rights and other metadata to streamline production pipelines. Consumers can use C2PA manifests to check the authenticity and provenance of content they find online.
C2PA is gaining traction in the media industry as a standards-based way of recording provenance. Leading media companies like Sinclair are testing how to incorporate the standard in their production workloads. Major technology and manufacturing companies, including AWS, are supporting this effort by developing tools to create and validate C2PA manifests.
To see how you can use the C2PA standard, review the technical specification here. A wide range of tools is also available so that you can experiment with C2PA workloads. For example, you can implement the AWS Fargate and AWS Lambda C2PA solutions discussed in this blog post in your own AWS account. To get started, visit our GitHub repository, which has all the code needed to deploy a C2PA workload in your own environment.