The Data Quality Assessment: Does Your Data Measure Up?
In this article, we’ll review the importance of data quality assessment and walk through four metrics for measuring data quality.
Businesses today are increasingly dependent on an ever-growing flood of information. Whether it is sales records, financial and accounting data, or sensitive customer information, the accuracy and adequacy of a company’s data are critical. If portions of that information are inaccurate or incomplete, the effect on the organization can range from embarrassing to catastrophic.
That’s why you, as an IT professional, should be committed to ensuring that the information your company relies on meets the highest data quality standards.
Mục lục
Measuring data quality: The data quality assessment
The term “data quality” refers to the suitability of data to serve its intended purpose. So, measuring data quality involves performing data quality assessments to determine the degree to which your data adequately supports the business needs of the company.
A data quality assessment is done by measuring particular features of the data to see if they meet defined standards. Each such feature is called a “data quality dimension,” and is rated according to a relevant metric that provides an objective assessment of quality.
The industry hasn’t yet settled on a standard set of data quality dimensions, but the following is a representative group:
Four metrics of data quality
Let’s take a brief look at each of these and at the metrics used in assessing them.
1. Completeness
Completeness relates to whether all required information is present in the dataset. For example, if the customer information in a database is required to include both first and last names, any record in which the first name or last name field is not populated is marked as incomplete. The metric used in assessing this dimension is the percentage of records that are complete.
2. Validity
Data is characterized as valid if it matches the rules specified for it. Those rules typically include specifications such as format (number of digits, etc.), allowable types (integer, floating-point, string, etc.), and range (minimum and maximum values). For example, a telephone number field that contains the string ‘1809 Oak Street’ is not valid. The metric for this dimension is the percentage of records in which all values are valid.
eBook
4 Ways to Measure Data Quality
To measure data quality and track the effectiveness of data quality improvement efforts you need data. Learn more about the variety of data and metrics that organizations can use to measure data quality.
Read
3. Timeliness
Timeliness relates to whether the information is up to date for the intended use. In other words, is the correct information available when needed?
For example, if a customer has notified the company of an address change, but the new address is not in the database at the time billing statements are processed, that entry fails the timeliness test. The metric used to measure timeliness is the time difference between when data is needed and when it is available.
4. Consistency
A data item is consistent if all representations of that item across data stores match.
If, for example, a birth date is entered in one system using the U.S. format (mm/dd/yyyy), but it is imported into another system where the date is entered using the European standard (dd/mm/yyyy), that data lacks consistency.
Add data integrity to the mix to complete the picture
When critical linkages between data elements are missing, that data is said to lack integrity. The four pillars of data integrity are data integration, data quality, location intelligence, and data enrichment.
An example of data integrity would be a Sales Transactions table in which the customer ID points to a record in the Customer table. If a customer record is deleted without updating related tables, records in the Sales Transaction table that point to that particular customer become “orphans” because their parent record no longer exists. This represents a loss of referential integrity. An appropriate metric for data integrity would be the number of orphan records present in a database.
How to get started with your data quality assessment
If you’ve never done a data quality assessment before, it can look a bit daunting. But it needn’t be. Sophisticated automated data quality solutions such as those provided by Precisely can make the process straightforward.
Check out our eBook, 4 Ways to Measure Data Quality, to learn more.