What is Data Management?

Data management is the process of collecting, storing, securing, and using an organization’s data. While organizations have several different data sources today, they have to analyze and integrate the data to derive business intelligence for strategic planning. Data management includes all the policies, tools, and procedures that improve data usability within the bounds of laws and regulations.

Why is data management important?

Data is considered to be a valuable resource for modern organizations. With access to large volumes and different data types, organizations invest significantly in data storage and management infrastructure. They use data management systems to run business intelligence and data analytics operations more efficiently. We give some benefits of data management below.

Increase revenue and profit

Data analysis gives deeper insights into all aspects of a business. You can action these insights to optimize business operations and reduce costs. Data analysis can also predict the future impact of decisions, improving decision making and business planning. Hence, organizations experience significant revenue growth and profits by improving their data management techniques.

Reduce data inconsistency

A data silo is a collection of raw data within an organization that only one department or group can access. Data silos create inconsistencies that reduce the reliability of data analysis results. Data management solutions integrate data and create a centralized data view for improved collaboration between departments.

Meet regulatory compliance

Laws like the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) give consumers control over their data. Individuals can seek legal recourse if they perceive that organizations:

  • Capture data without consent
  • Exercise poor control over data location and use
  • Store data in spite of erasure requests

Hence, organizations require a data management system that is fair, transparent, and confidential while still maintaining accuracy.

What are the areas of focus for data management?

The practice of data management spans the collection and distribution of high-quality data, in addition to data governance, to control access to the data.

Data quality management

Users of data expect the data to be sufficiently reliable and consistent for each use case.

Data quality managers measure and improve an organization's data quality. They review both existing and new data and verify that it meets standards. They might also set up data management processes that block low-quality data from entering the system. Data quality standards typically measure the following:

  • Is key information missing or is the data complete? (for example, customer leaves out key contact information)
  • Does the data meet basic data check rules? (for example, a phone number should be 10 digits)
  • How often does the same data appear in the system? (for example, duplicate data entries of the same customer)
  • Is the data accurate? (for example, customer enters the wrong email address)
  • Is data quality consistent across the system? (for example, date of birth is dd/mm/yyyy format in one dataset but mm/dd/yyyy format in another dataset)

Data distribution and consistency

Endpoints for data distribution

For most organizations, data has to be distributed to (or near) the various endpoints where the data is needed. These include operational systems, data lakes, and data warehouses. Data distribution is necessary because of network latencies. When data is needed for operational use, the network latency might not be sufficient to deliver it in a timely manner. Storing a copy of the data in a local database resolves the network latency issue.

Data distribution is also necessary for data consolidation. Data warehouses and data lakes consolidate data from various sources to present a consolidated view of information. Data warehouses are used for analytics and decision making, whereas data lakes are a consolidated hub from which data can be extracted for various use cases.

Data replication mechanisms and impact on consistency

Data distribution mechanisms have a potential impact on data consistency, and this is an important consideration in data management.

Strong consistency results from synchronous replication of data. In this approach, when a data value is changed, all applications and users will see the changed value of the data. If the new value of data has not been replicated as yet, access to the data is blocked until all the copies are updated. Synchronous replication prioritizes consistency over performance and access to data. Synchronous replication is most often used for financial data.

Eventual consistency results from asynchronous replication of data. When data is changed, the copies are eventually updated (usually within seconds), but access to outdated copies is not blocked. For many use cases, this is not an issue. For example, social media posts, likes, and comments do not require strong consistency. As another example, if a customer changes their phone number in one application, this change can be cascaded asynchronously.

Comparing streaming with batch updates

Data streams cascade data changes as they occur. This is the preferred approach if access to near real-time data is required. Data is extracted, transformed, and delivered to its destination as soon as it is changed.

Batch updates are more appropriate when data has to be processed in batches before delivery. Summarizing or performing statistical analysis of the data and delivering only the result is an example of this. Batch updates can also preserve the point-in-time internal consistency of data if all the data is extracted at a specific point in time. Batch updates through an extract, transform, load (ETL or ELT) process is typically used for data lakes, data warehousing, and analytics.

Big data management

Big data is the large volumes of data that an organization collects at a high speed over a short period of time. Video news feeds on social media and data streams from smart sensors are examples of big data. Both the scale and complexity of operations create challenges in big data management. For instance, a big data system stores data such as:

  • Structured data that represents well in tabular format
  • Unstructured data like documents, images, and videos
  • Semistructured data that combines the preceding two types

Big data management tools have to process and prepare the data for analytics. The tools and techniques required for big data typically perform the following functions: data integration, data storage, and data analysis.

Data architecture and data modeling

Data architecture

Data architecture describes an organization’s data assets, and provides a blueprint for creating and managing data flow. The data management plan includes technical details, such as operational databases, data lakes, data warehouses, and servers, that are best suited to implementing the data management strategy.

Data modeling

Data modeling is the process of creating conceptual and logical data models that visualize the workflows and relationships between different types of data. Data modeling typically begins by representing the data conceptually and then representing it again in the context of the chosen technologies. Data managers create several different types of data models during the data design stage.

Data governance

Data governance includes the policies and procedures that an organization implements to manage data security, integrity, and responsible data utility. It defines data management strategy and determines who can access what data. Data governance policies also establish accountability in the way teams and individuals access and use data. Data governance functions typically include:

Regulatory compliance

Data governance policies reduce the risk of regulatory fines or actions. They focus on employee training so that adherence to laws happens at all levels. For example, an organization collaborates with an external development team to improve its data systems. Data governance managers verify that all personal data is removed before passing it to the external team to use for testing purposes.

Data security and access control

Data governance prevents unauthorized access to data and protects it from corruption. It includes all aspects of protection, such as the following:

  • Preventing accidental data movement or deletion
  • Securing network access to reduce the risk of network attacks
  • Verifying that the physical data centers that store data meet security requirements
  • Keeping data secure even when employees access it from personal devices
  • User authentication, authorization, and the setting and enforcement of access permissions for data
  • Ensuring that the stored data complies with the laws in the country where the data is stored
     

What are some data management challenges?

The following are common data management challenges.

Scale and performance

Organizations require data management software that performs efficiently even at scale. They have to continually monitor and reconfigure data management infrastructure to maintain peak response times even as data grows exponentially.

Changing requirements

Compliance regulations are complex and change over time. Similarly, customer requirements and business needs also change rapidly. Although organizations have more choice in the data management platforms they can use, they have to constantly evaluate infrastructure decisions to maintain maximum IT agility, legal compliance, and lower costs.

Employee training

Getting the data management process started in any organization can be challenging. The sheer volume of data can be overwhelming and interdepartmental silos might also exist. Planning a new data management strategy and getting employees to accept new systems and processes takes time and effort.

What are some data management best practices?

Data management best practices form the basis of a successful data strategy. The following are  common best practices.

Team collaboration

Business users and technical teams must collaborate to ensure that an organization's data requirements are met. All data processing and analysis should prioritize business intelligence requirements. Otherwise, collected data will remain unused, with resources wasted in poorly planned data management projects.

Automation

A successful data management strategy incorporates automation in most of the data processing and preparation tasks. Performing data transformation tasks manually is tedious and also introduce errors in the system. Even a limited number of manual tasks, such as running weekly batch jobs, can cause system bottlenecks. Data management software can support faster and more efficient scaling.

Cloud computing

Businesses require modern data management solutions that provide them with a broad set of capabilities. A cloud solution can manage all aspects of data management at scale without compromising on performance. For example, AWS offers a wide range of functionalities, such as databases, data lakes, analytics, data accessibility, data governance, and security, from within a single account.

How can AWS help with data management?

AWS is a global data management platform that you can use to build a modern data strategy. With AWS, you can choose the right purpose-built database, achieve performance at scale, run fully managed databases, and rely on high-availability and security.

Get started with data management on AWS by creating an AWS account today.

AWS Data Management next steps

Check out additional product-related resources
Learn more about Databases Services 
Sign up for a free account

Instantly get access to the AWS free tier. 

Sign up 
Start building in the console

Get started building with AWS in the AWS Management Console.

Sign in