About the Dimensions of Data Quality | Conformed Dimensions of Data Quality
Generally speaking, the Dimensions of Data Quality are categories used to characterize data and it’s fitness for use. These are needed for humans to define and communicate with each other regarding expectations of information and data quality.
Here are a few reasons that the dimensions of data quality are used:
- Act as quick reference, checklist, and guide to quality standards
- Can be used as framework to segment DQ efforts across a business unit, or even a company Enable people to communicate current and desired state of data
- Reuse of existing categories and definitions enables faster implementation times
- Understand what you will (and will not) get from assessing each dimension1
- Match dimensions against a business need and prioritize which assessments to complete first
Usage of the Dimensions of Data Quality:
In a 2015 survey of data management professionals, it was found that 35% of organizations use the dimensions of data quality to classify data related defects (see chart at right). Many formal data quality assessment methodologies are framed with the dimensions of data quality or at the very least include a step that incorporate them2.
Brief History:
The concept of the dimensions of data quality isn’t new. In fact the earliest published work in this area that we are aware of was by Professors Richard Wang and Diane Strong in their 1996 paper titled Beyond Accuracy: What Data Quality Means to Data Consumers. In their work they identified 15 dimensions within 4 categories. Many other authors such as Thomas Redman, Larry English, David Loshin, and Danette McGilvray have also provided their own versions of the dimensions of data quality. A few organizations have also identified the dimensions and provided their own definitions as well, such as The Data Warehousing Institute (TDWI), and the Data Administration Management Association (DAMA). The principal contributor/steward of the Conformed Dimensions, Dan Myers, compared six authoritative information quality authors and organization’s definitions of the dimensions of data quality, in 2013, as part of a series on this topic in Information-Management.com. This was the groundwork for what we now call the Conformed Dimensions of Data Quality.
Relationship with Data Quality Tools:
Data can be characterized for fitness of use through human review, but this process can be slow and introduce errors caused by human oversight, forgetfulness and data entry. Because some of the dimensions of data quality describe the characteristics of data in a formulaic manner, software tools can leverage these definitions to automate the assessment of data quality. As described in the Conformed Dimensions of Data Quality, there are underlying concepts within each dimension of data quality. Each of these concepts and it’s associated metrics offer formulas that computers can use to profile or provide reports measuring adherence to definition. Various technology firms offer advisory services when choosing which tool best fits your organizational requirements. Gartner provides a list of the companies offering data quality tools in their publication titled, Critical Capabilities for Data Quality Tools.
Citation:
- Danette McGilvray, 2008 p. 30-31
- Various
- Thomas Redman, Data Quality: Management and Technology. 1992.
- Larry English, Improving data warehouse and business information quality: methods for reducing costs and increasing profits. 1999.
- Danette McGilvray, Executing data quality projects: ten steps to quality data and trusted information. 2008.
- Laura Sebastian-Coleman, Measuring data quality for ongoing improvement: a data quality assessment framework. 2013.