AWS Partner Network (APN) Blog
Building a No-Code Business Intelligence Platform for Custom Data Analytics at Scale with Shipsy
By Himanshu Gupta, CTO – Shipsy
By Gaurav Malhotra, Sr. Solutions Architect – AWS
Shipsy |
Founded in 2015, Shipsy is an artificial intelligence (AI)-powered logistics management platform that enables businesses to accelerate growth, optimize costs, and enhance customer experiences while simplifying end-to-end logistics and supply chain operations.
Shipsy is an AWS Partner that’s been at the forefront of innovation, with an unwavering focus on customer experience, rapid adaptation, platform evolution, and global relevance. Its cutting-edge solutions enable businesses to drastically reduce operational costs, enhance customer experience, eliminate manual inefficiencies, achieve real-time visibility, and make transportation greener.
Shipsy’s product portfolio comprises intuitive, comprehensive, scalable, and no-code solutions that target customer pain points. When it comes to offering data analytics to customers, Shipsy focuses on scalability, ease of use, and responsiveness.
In this post, we will share how Shipsy built a scalable, intuitive, and no-code analytics platform for getting near real-time data insights from operational data using Amazon Web Services (AWS).
We’ll also discuss how it ensures secure, rapid, and robust data migration from existing analytics infrastructure and how Shipsy optimizes the analytics platform for delivering rapid responses, configuration, and data visualization.
Self-Service Custom Data Analytics at Scale
As Shipsy scaled, the data downloads requested by clients for regular analytics— such as annual reports, month-end data, and day-end data— hit terabyte limits.
The data downloads were mostly concurrent, and catering to these requests from multiple clients with the existing Amazon Relational Database Service (Amazon RDS)-based analytics framework became impossible as the organization scaled.
Every business has different expectations when it comes to data analytics; a number of Shipsy’s clients requested data analytics from the engineering team as they were not able to “query” the data or create rich visualizations. Shipsy aimed to build a powerful and intuitive analytics dashboard for clients where they can play around with data in a self-service manner.
Finding the Right Data Warehouse Solution
As Shipsy started evaluating available data warehouse solution in the market, primary requirements from data warehouses were around onboarding a columnar database with horizontal scalability, along with being cost effective at terabyte scale. Additionally, Shipsy was looking at SQL-analogous setup for minimizing the learning curve for its team.
After a thorough evaluation across platforms, Shipsy chose Amazon Redshift which provided a SQL-based engine to analyze structured and semi-structured data across data warehouses, operational databases (existing Amazon RDS), and data lakes. It used AWS-designed hardware and machine learning (ML) to deliver the best price performance at any scale.
Data Replication from Amazon RDS to Amazon Redshift
Having the right data warehouse in place, the next step was to synchronize data between the existing framework and Amazon Redshift, and thereafter onboard existing customer data for analytical queries with minimum disruption.
Below are the challenges Shipsy had to address while building a data replication strategy:
- Data transformation and enrichment between previously used Amazon RDS and Amazon Redshift.
- Data reliability during migration of the terabytes of data from existing platform to Amazon Redshift.
- Ensure Shipsy analytics platform availability during data migration.
- Avoid data inconsistencies and maintain data integrity during the data migration.
Shipsy accomplished the tasks at hand by using a finely-tuned extract, transform, load (ETL) pipeline that scrapes data from the transactional database, transforms it to fit the internal data structure, and pushes the data to Amazon Simple Storage Service (Amazon S3) using the Upsert method to update the Redshift database.
Figure 1 – Functional overview of data flow in Shipsy.
Based on how particular datasets are being queried, Shipsy optimized the data storage over the multiple-server cluster of Amazon Redshift. This fine-tuning allowed Shipsy to store the data in an optimized manner for faster data processing, and also offered the benefits of no memory or disk footprint along with resource efficiency.
Shipsy’s ETL Pipeline Features
The module runs on a fine-tuned ETL pipeline with the following features:
- Near real-time data transfer: Shipsy is able to deliver near real-time analytics to customers, irrespective of the data dump requested or visualization needs. Even if the data is of massive size, the processing time doesn’t go beyond a few minutes, which is a crucial operational advantage.
- No delete propagation: The deletion of data done in the client app does not propagate to the Redshift server and can be reused by the customer at any point in the future, as and when required.
- Reliable and robust data updates: There’s an internal register where Shipsy stores the most recent value of any data columns that are changed or updated. This means any data transfer will happen only after that recent value is checked, verified, and confirmed to be true.
Underlying Tech for Smart, Scalable, and No-Code Analytics
Shipsy’s business intelligence (BI) platform comes with easy-to-use and self-serve analytics capabilities, search for all master data, and offers different types of visualization. It allows clients to do multi-stage data digging and offers granular insights into data via pivot tables, big numbers, charts, and graphs.
The core technical features of the Shipsy BI platform includes the following:
- Smart queuing: Independent queues based on the type and size of the query reduce response times for all three types of queries: analytical, data dump, and ETL. These queues are configured such that dynamic priority allocation can be done based on the current queue length.
- Concurrency scaling: If the high and medium priority queue exceeds a certain threshold value, Shipsy BI can initiate the burst mode for up to one hour to eliminate the queue. This is only enabled for priority queues to increase the performance of critical tasks.
- Intelligent keys management: Due to the horizontal structure of the compute nodes, any data broadcast has a negative impact on both the data processed and the query time. Hence, Shipsy BI uses different types of sort keys: compound sort key, and interleaved sort key. Shipsy BI has used AND distribution keys to optimize query joins, and it keeps all master data with distribution style ALL—that is, all nodes in the Redshift cluster will have the full table to avoid data broadcasts, such as worker, vehicle, and hub table.
- Customizable visualization plugins: Shipsy BI offers rich dashboards where users can create and generate reports.
- Templatized report library: This is a tagged library of user-created and out-of-box reports that can enable instant dashboard creation.
- Machine learning-based caching: Shipsy BI has chart-level caching for minimizing redundant queries that facilitate creating more responsive charts and dashboards. Cache settings are determined by historical usage patterns.
Components of the Shipsy BI Platform
Currently, Shipsy’s BI platform helps clients make advanced data queries without coding and creating custom tables, custom queries, and custom indices. It has four major components:
Query Builder
This section offers data in the form of tables and columns and offers three core functionalities:
- Report creation: Users can choose the table from which they need to create a report. They can also use the sample reports available.
- Data setup: Users can apply data filters specific to any time frame or any other quantifiable parameter and set the “focal point of data analysis” in this section.
- Chart configuration: Users can create powerful data visualizations in the form of charts and configure these charts in multiple ways. This visualized data can be saved or downloaded as per the user’s requirements.
Dashboard
This is the default console landing page of Shipsy BI and has a collection of reports and dashboards. There are multiple types of dashboards, such as default, custom, and shared dashboards. Users can create rich dashboards using customizable visualization plugins.
Users can also play with or manipulate the reports, and access to the report manipulation functions can be configured for controlled access. There are a number of custom settings (such as the filter feature) that also come as an advantage over the standard folder-based data processing systems.
Reports
This section keeps track of all the reports created using the Shipsy BI and offers different filters, such as default, user, all, shared, and last updated. Clients can check who created which report and enable or disable the report actions and filters for controlled access.
It has a templatized and tagged report library of user-created and out-of-box reports that can enable instant dashboard creation.
Alerts and Event Management
Users can create and manage various alerts and events in an easy and intuitive manner. They can specify custom criteria and outcomes for alerts and event type and time for events.
Conclusion
Shipsy’s business intelligence (BI) platform comes as a built-in product offering to clients at no additional costs and is highly scalable, efficient, and supports concurrency. The platform offers real-time alerts of key performance indicator (KPI) breaches and incidents. It can forecast future demand via predictive analytics and empowers users to optimize logistics operations and anticipate/address potential issues, before they occur.
Businesses can easily evaluate the performance of all their third-party logisitcs partners, vehicles, drivers, capacity utilization, failed deliveries, and more. They can query organizational data in a no-code format, analyze it, and visualize it via highly consumable reports that can be shared, exported, and downloaded easily.
The Shipsy BI platform offers highly configurable data charts and multiple table formatting options to ensure informed and enhanced decision-making. Users can create interactive dashboards and leverage the role-based access control (RBAC) feature to ensure data security across multiple users.
Shipsy – AWS Partner Spotlight
Shipsy is an AWS Partner that’s been at the forefront of innovation, with an unwavering focus on customer experience, rapid adaptation, platform evolution, and global relevance.