How to create and implement a robust data quality framework (part two)
In the first part of this series, I discussed what data quality is, why it’s important, what causes data quality issues etc. I’d argued that the quality of data is closely intertwined with the purpose it is intended to serve. In essence, the use case is a key driver for the quality of data.
This blog explores how you can leverage that approach to solve data quality problems.
The data quality framework
The framework we recommend is hierarchical and seen from the data users’ perspective. As you see in the diagram below, there are two dimensions to data quality here:
Baseline of data quality requirements: the framework begins with defining the baseline rules of data quality. These form the defaults, or common denominator quality rules for all use cases this data can enable. Data type validation (alphanumeric characters in numeric fields like price/sales) and data format validation (date format) are some of the examples needed irrespective of use case.
Fit-for-purpose data quality requirements: this is the quality required for the data to serve its intended purpose – you cover a critical path that includes just enough quality checks. You can start moving to other use-cases for those according to the delivery/business plan iteratively.
In most cases, the effort, time and resources required to fix all data quality issues is not justifiable given we may not even be using the data that way. So, it is important to identify the critical path. For instance, if you’re looking to use data for a retailer’s pricing decisions, validating the email address of customers to check if it belongs to the correct recipeint might not be required, as that does not factor in the dynamic pricing algorithm. Please note here that the email structure or format validation is still a required data quality check and it belongs to the baseline data quality requirement.