AWS Big Data Blog
Amazon Redshift announces general availability of support for JSON and semi-structured data processing
At AWS re:Invent 2020, we announced the preview of native support for JSON and semi-structured data in Amazon Redshift. This includes a new data type, SUPER, which allows you to store JSON and other semi-structured data in Amazon Redshift tables, and support for the PartiQL query language, which allows you to seamlessly query and process the semi-structured data. SUPER and PartiQL together enable you to achieve advanced analytics that combine classic structured SQL data (such as strings, numerics, and timestamps) and semi-structured data with superior performance, flexibility, and ease of use.
Today, we’re excited to announce the general availability of the SUPER data type and PartiQL support in Amazon Redshift.
Customer use cases
During the past four months of preview availability, customers from a broad range of industries have used JSON and semi-structured data processing with Amazon Redshift. Here are some of the ways they are taking advantage.
Yelp
Yelp is a local-search service powered by a crowd-sourced review forum, with more than 148 million reviews from around the world. Many Yelp microservices generate JSON-based logs to power subsequent data mining usage.
“We are excited to see the Amazon Redshift SUPER data type general availability. Native JSON data often appears in our infrastructure edge environment. To avoid intensive data engineering to flatten their schema, we may store the JSON document as varchar. That only allows us to use json_extract to access the stored JSON document for analytics. SUPER is a game changer for us to use a PartiQL-style SQL accessor to query semi-structured data seamlessly with a much better developer experience and sometimes even better access speed.” —Steven Moy, Software Engineer at Yelp
Sony
Sony Corporation (Sony) is a creative entertainment company with a solid foundation of technology. From game and network services to music, pictures, electronics, image sensors, and financial services, Sony’s purpose is to fill the world with emotion through the power of creativity and technology. Sony uses data to accelerate their creation and enhancement of products and services.
“We use Redshift as our data warehouse for analysis of various products and services. Previously, when loading data to Redshift, we had to examine the data structure and predefine the schema. Because of the wide variety of formats used by various products and services, schema changes and additions were frequent and time-consuming. Since the SUPER type frees us from multiple data schema updates, we can reduce operational cost by 60% and improve ingestion performance for semi-structured data without schema definition. This allows us to immediately start exploratory analysis.” —Keiko Hara, Software Engineer at Sony.
Livesense
Livesense is one of the leading Japan-based companies dealing with internet media on human resources, real estate, and more. Livesense solutions include Machbaito, Tenshoku Kaigi, and IESHIL, which help users make critical decisions on finding temporary jobs, next jobs, and real estate, respectively.
“We use Redshift as our data analytics platform. By leveraging this platform, we find insights to assess and enhance our internet media from data. Before storing data in Redshift, we had to transform semi-structured data included in each record with other services and technologies. Now, this SUPER type allows us to directly store semi-structured data into one column and one record and start data analytics immediately. Since we started using the SUPER type, we’ve reduced our operational cost by approximately 30%.” —Masashi Yoshitake, leader of Livesense Analytics team.
Learn more and get started today
You can learn more about the SUPER data type and PartiQL support in Amazon Redshift in Ingesting and querying semistructured data in Amazon Redshift.
About the Author
Joe Yong is relentless pursuer of making complex database technologies easy and intuitive for the masses. In 20 years of building database engines and large scale data management systems, Joe has shipped dozens of features for on-premises and cloud-native databases. His work spans both SMP and MPP systems ranging from IOT devices to Petabyte-sized cloud data warehouses. Joe is currently a Product Manager with Amazon Web Services helping make Amazon Redshift – the world’s most popular, highest performance and most scalable cloud data warehouse – even better. Off keyboard, Joe tries to onsight 5.11s, hunt for good eats and seek a cure for his Australian Labradoodle’s obsession with squeaky tennis balls.