AWS Machine Learning Blog
AWS Localization uses Amazon Translate to scale localization
The AWS website is currently available in 16 languages (12 for the AWS Management Console and for technical documentation): Arabic, Chinese Simplified, Chinese Traditional, English, French, German, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Thai, Turkish, and Vietnamese. Customers all over the world gain hands-on experience with the AWS platform, products, and services in their native language. This is made possible thanks to the AWS Localization team (AWSLOC).
AWSLOC manages the end-to-end localization process of digital content at AWS (webpages, consoles, technical documentation, e-books, banners, videos, and more). On average, the team manages 48,000 projects across all digital assets yearly, which amounts to over 3 billion translated words. Given the growing demand of global customers and new local cloud adoption journeys, AWS Localization needs to support content localization at scale, with the aim to make more content available and cater to new markets. To do so, AWSLOC uses a network of over 2,800 linguists globally and supports hundreds of content creators across AWS to scale localization. The team strives to continuously improve the language experience for customers by investing heavily in automation and building automated pipelines for all content types.
AWSLOC aspires to build a future where you can interact with AWS in your preferred language. To achieve this vision, they’re using AWS machine translation and Amazon Translate. The goal is to remove language barriers and make AWS content more accessible through consistent locale-specific experiences to help every AWS creator deliver what matters most to global audiences.
This post describes how AWSLOC uses Amazon Translate to scale localization and offer their services to new locales. Amazon Translate is a neural machine translation service that delivers fast, high-quality, cost-effective, and customizable language translation. Neural machine translation is a form of language translation that uses deep learning models to deliver accurate and natural sounding translation. For more information about the languages Amazon Translate supports, see Supported languages and language codes.
How AWSLOC uses Amazon Translate
The implementation of machine translation allows AWSLOC to speed up the localization process for all types of content. AWSLOC chose AWS technical documentation to jumpstart their machine translation journey with Amazon Translate because it’s one of the pillars of AWS. Around 18% of all customers chose to view technical documentation in their local language in 2021, which is a 27% increase since 2020. In 2020 alone, over 1,435 features and 31 new services were added in technical documentation, which generated an increase of translation volume of 353% in 2021.
To cater to this demand for translated documentation, AWSLOC partnered with Amazon Translate to optimize the localization processes.
Amazon Translate is used to pre-translate the strings that fall below a fuzzy matching threshold (against the translation memory) across 10 supported languages. A dedicated Amazon Translate instance was configured with Active Custom Translation (ACT) and the corresponding parallel data was updated on a monthly basis. In most of the language pairs, the Amazon Translate plus ACT output has shown a positive trend in quality improvement across the board. Furthermore, to raise the bar on quality, a human post-editing process is then performed on assets that have a higher customer visibility. AWSLOC established a governance process to monitor migration of content across machine translation and machine translation post-editing (MTPE), including MTPE-Light and MTPE-Premium. Human editors review MT outputs to correct translation errors, which are incorporated back into the tool via the ACT process. There is a regular engine refresh (once every 40 days on average), the contributions being mostly bug submissions.
AWSLOC follows best practices to maintain the ACT table, which includes marking some terms with the do not translate feature provided by Amazon Translate.
The following diagram illustrates the detailed workflow.
The main components in the process are as follows:
- Translation memory – The database that stores sentences, paragraphs, or bullet points that have been previously translated, in order to help human translators. This database stores the source text and its corresponding translation in language pairs, called translation units.
- Language quality service (LQS) – The accuracy check that an asset goes through after the Language Service Provider (LSP) completes their pass. 20% of the asset is spot-checked unless otherwise specified.
- Parallel data – The method for analyzing data using parallel processes that run simultaneously on multiple containers.
- Fuzzy matching – This technique is used in computer-assisted translation as a special case of record linkage. It works with matches that may be less than 100% perfect when finding correspondences between segments of a text and entries in a database of previous translations.
- Do-not-translate terms – A list of phrases and words that don’t require translation, such as brand names and trademarks.
- Pre-translation – The initial application of do-not-translate terms, translation memory, and machine translation or human translation engines against a source text before it’s presented to linguists.
MTPE-Light produces understandable but not stylistically perfect text. The following table summarizes the differences between MTPE-Light and MTPE-Premium.
MTPE-Light | MTPE-Premium |
Additions and omissions | Punctuation |
Accuracy | Consistency |
Spelling | Literalness |
Numbers | Style |
Grammar | Preferential terminology |
. | Formatting errors |
Multi-faceted impacts
Amazon Translate is a solution for localization projects at scale. With Amazon Translate, the project turnaround time isn’t tethered to translation volume. Amazon Translate can deliver more than 50,000 words within 1 hour compared to traditional localization cycles, which can complete 10,000-word projects in 7–8 days and 50,000-word projects in 30–35 days. Amazon Translate is also 10 times cheaper than standard translation, and it makes it easier to track and manage the localization budget. Compared to human translation projects that use MTPE-Premium, AWSLOC observed a savings of up to 40%, and a savings of up to 60% for MTPE-Light. Additionally, projects with machine translation exclusively only incur a monthly flat fee—the technology costs for the translation management system AWSLOC uses to process machine translation.
Lastly, thanks to Amazon Translate, AWSLOC is now able to go from monthly to weekly refresh cycles for technical documentation.
All in all, machine translation is the most cost-effective and time-saving option for any global localization team if they want to cater to an increasing amount of content localization in the long term.
Conclusion
The benefits of Amazon Translate are great to Amazon and to our customers, both in exercising savings and delivering localized content faster and in multiple languages. For more information about the capabilities of Amazon Translate, see the Amazon Translate Developer Guide. If you have any questions or feedback, feel free to contact us or leave a comment.
About the authors
Marie-Alice Daniel is a Language Quality Manager at AWS, based in Luxembourg. She leads a variety of efforts to monitor and improve the quality of localized AWS content, especially Marketing content, with a focus on customer social outreach. She also supports stakeholders to address quality concerns and to ensure localized content consistently meets the quality bar.
Ajit Manuel is a Senior Product Manager (Tech) at AWS, based in Seattle. Ajit leads the localization product management team that builds solutions centered around language analytics services, translation automation and language research and design. The solutions that Ajit’s team builds help AWS scale its global footprint while staying locally relevant. Ajit is passionate about building innovative products especially in niche markets and has pioneered solutions that augmented digital transformation within the insurance-tech and media-analytics space.