What is Data Quality | Informatica
Mục lục
What do I need to do know about data quality?
Quality data is useful data. To be of high quality, data must be consistent and unambiguous. Data quality issues are often the result of database merges or systems/cloud integration processes in which data fields that should be compatible are not due to schema or format inconsistencies. Data that is not high quality can undergo data cleansing to raise its quality.
What are the benefits of data quality?
When data is of excellent quality, it can be easily processed and analyzed, leading to insights that help the organization make better decisions. High-quality data is essential to cloud analytics, AI initiatives, business intelligence efforts, and other types of data analytics.
In addition to helping your organization extract more value from its data, the process of data quality management improves organizational efficiency and productivity, along with reducing the risks and costs associated with poor quality data. Data quality is, in short, the foundation of the trusted data that drives digital transformation—and a strategic investment in data quality will pay off repeatedly, in multiple use cases, across the enterprise.
What activities are involved in data quality management?
Data quality activities involve data rationalization and validation. Data quality efforts are often needed while integrating disparate applications that occur during merger and acquisition activities, but also when siloed data systems within a single organization are brought together for the first time in a cloud data warehouse or data lake. Data quality is also critical to the efficiency of horizontal business applications such as enterprise resource planning (ERP) or customer relationship management (CRM).
The Foundational Components of Data Quality
The success of data quality management is measured by how confident you are in the accuracy of your analysis, how well the data supports various initiatives, and how quickly those initiatives deliver tangible strategic value. To achieve all those goals, your data quality tools must be able to:
- Support all use cases: Data migration requires different data quality metrics than next-gen analytics. Avoid a one-size-fits-all approach in favor of one integrated solution that lets you choose the right capabilities for your particular use cases. For example, if you are migrating data, you first need to understand what data you have (profiling) before moving it. For an analytics use case, you want to cleanse, parse, standardized, and de-duplicate data.
- Accelerate and scale: Data quality is equally critical for web services, batch, big data, and real-time workloads. It needs to be trusted, secure, governed, and fit for use regardless of where it resides (on-premises, cloud) or its velocity (batch, real-time, sensor/IoT, and so on). Look for a solution that scales to fit any workload across all departments. You may want to start by focusing on the quality of data within one application or process, using out-of-the-box business rules and accelerators plus role-based self-service tools to profile, prepare, and cleanse your data. Then, when you’re ready to expand the program, you can deploy the same business rules and cleansing processes across all applications and data types at scale.
- Deliver a flexible user experience: Data scientists, data stewards, and data consumers all have specific capabilities, skill sets, and interests in working with data. Choose a data quality solution that tailors the user experience by role so all your team members can achieve their goals without IT intervention.
- Automate critical tasks: The volume, variety, and velocity of today’s enterprise data makes manual data quality management impossible. An AI-powered solution can automatically assess data quality and make intelligent recommendations that streamline key tasks like data discovery and data quality rule creation across the organization.
Dimensions of Data Quality
Data quality operates in six core dimensions:
- Accuracy: The data reflects the real-world objects and/or events it is intended to model. Accuracy is often measured by how the values agree with an information source that is known to be correct.
- Completeness: The data makes all required records and values available.
- Consistency: Data values drawn from multiple locations do not conflict with each other, either across a record or message, or along all values of a single attribute. Note that consistent data is not necessarily accurate or complete.
- Timeliness: Data is updated as frequently as necessary, including in real time, to ensure that it meets user requirements for accuracy, accessibility and availability.
- Validity: The data conforms to defined business rules and falls within allowable parameters when those rules are applied.
- Uniqueness: No record exists more than once within the data set, even if it exists in multiple locations. Every record can be uniquely identified and accessed within the data set and across applications.
All six of these dimensions of data quality are important, but your organization may need to emphasize some more than others to support specific use cases. For example, the pharmaceuticals industry requires accuracy, while financial services firms must prioritize validity.
Examples of Data Quality Metrics
Some data quality metrics are consistent across organizations and industries – for example, that customer billing and shipping information is accurate, that a website provides all the necessary details about products and services, and that employee records are up-to-date and correct.
Here are some examples related to different industries:
-
Healthcare data quality metrics
Healthcare organizations need complete, correct, unique patient records to drive proper treatment, fast and accurate billing, risk management, and more effective product pricing and sales.
-
Public sector data quality metrics
Public sector agencies need complete, consistent, accurate data about constituents, proposed initiatives, and current projects to understand how well they’re meeting their goals.
-
Financial services data quality metrics
Financial services firms must identify and protect sensitive data, automate reporting processes, and monitor and remediate regulatory compliance.
-
Manufacturing data quality metrics
Manufacturers need to maintain accurate customer and vendor records, be notified in a timely way of QA issues and maintenance needs, and track overall supplier spend for opportunities to reduce operational costs.
Data Quality Issues
The potential ramifications of poor data quality range from minor inconvenience to business failure. Data quality issues waste time, reduce productivity and drive up costs. They can also tarnish customer satisfaction, damage brand reputation, force an organization to pay heavy penalties for regulatory noncompliance—or even threaten the safety of customers or the public. Here are a few examples of companies that faced the consequences of data quality issues and found a way to address them:
- Poor data quality conceals valuable cross-sell and upsell opportunities and leaves a company struggling to identify gaps in its offerings that might inspire innovative products and services or allow it to tap into new markets. Nissan Europe’s customer data was unreliable and scattered across various disconnected systems, which made it difficult for the company to generate personalized offers and target them effectively. By improving data quality, the company now has a better understanding of its current and prospective customers, which has helped it improve customer communications and raise conversion rates while reducing marketing costs.
- Poor data quality wastes time and forces rework when manual processes fail or have to be checked repeatedly for accuracy. CA Technologies faced the prospect of spending months manually correcting and enhancing customer contact data for a major Salesforce migration. By incorporating automated email verification and other data quality measures into the migration and integration process, the company was able to use a smaller migration team than expected and complete the project in a third of the allotted time with measurably better data.
Four Steps to Start Improving Your Data Quality
1. Discover
You can only plan your data quality journey once you understand your starting point. To do that, you’ll need to assess the current state of your data: what you have, where it resides, its sensitivity, data relationships, and any quality issues it has.
2. Define rules
The information you gather during the discovery phase shapes your decisions about the data quality measures you need and the rules you’ll create to achieve the desired end state. For example, you may need to cleanse and deduplicate data, standardize its format, or discard data from before a certain date. Note that this is a collaborative process between business and IT.
3. Apply rules
Once you’ve defined rules, you will integrate them into your data pipelines. Don’t get stuck in a silo; your data quality tools need to be integrated across all data sources and targets in order to remediate data quality across the entire organization.
4. Monitor and manage
Data quality is not a one-and-done exercise. To maintain it, you need to be able to monitor and report on all data quality processes continuously, on-premises and in the cloud, using dashboards, scorecards, and visualizations.
Data Quality Customer Success Stories
This storied Major League Baseball team relies on data to deliver richer ballpark experiences, maximize marketing opportunities for branded merchandise, and decide how best to invest in players, staff, and infrastructure. Using Informatica Data Quality lets the team cleanse and improve data from 24 on-premises and cloud systems as well as third parties so it can drive new revenue, make faster decisions, and build lifelong relationships with millions of fans around the world.
One of Singapore’s leading financial services and insurance firms, AIA Singapore deployed Informatica Data Quality to profile its data, track key performance indicators (KPIs), and perform remediation. Higher-quality data creates a deeper understanding of customer information and other critical business data, which in turn is helping the firm optimize sales, decision-making, and operational costs.
Start Unlocking the Value of Your Data
Data is everywhere and data quality is critical to making the most of it for everyone, everywhere. Keep these principles in mind as you work to improve your data quality:
- Make it an enterprise-wide strategic initiative.
- Emphasize the importance of data quality to data governance.
- Integrate data quality into your operations.
- Collaborate with business users to contextualize data and assess its value.
- Extend data quality to new areas (data lakes, AI, IoT) and new data sources.
- Leverage AI/machine learning to automate repetitive tasks like merging records and pattern matching.
All of these become much easier with Informatica’s integrated Intelligent Data Platform, which incorporates data quality into a broader infrastructure that touches all enterprise data.