What Is Data Quality? – DATAVERSITY
The Data Management Body of Knowledge (DMBoK) defines Data Quality (DQ) as “the planning, implementation, and control of activities that apply quality management techniques to data, in order to assure it is fit for consumption and meet the needs of data consumers.”
Since expectations about DQ are not always verbalized and known, an ongoing discussion is needed. DQ depends on context and the data consumer’s requirements. Implementing effective DQ management and using DQ tools helps organizations maintain and improve DQ to an acceptable level. Business leaders use Data Quality dimensions to measure DQ and build more trust in the data.
A Short List of Data Quality Dimensions:
- Accuracy
- Completeness
- Consistency
- Integrity
- Reasonability
- Timeliness
- Uniqueness/Deduplication
- Validity
- Accessibility
Other Definitions Include:
- “Fit for a purpose. Meets the requirements of its authors, users and administrators.” (Dr. Peter Aiken, adapted from Martin Eppler)
- “Reliance on accuracy, consistency, and completeness of data to be useful across the enterprise.” (Michelle Knight)
- Tools and processes used for parsing and standardization, generalized “cleansing,” matching, profiling, monitoring, and enrichment (Gartner)
- Strong-Wang framework: (Wang, and Strong, MIT and DAMA DMBoK)
- Intrinsic DQ:
- Accuracy
- Objectivity
- Believability
- Reputation
- Contextual DQ:
- Value-added
- Relevancy
- Completeness
- Appropriate amount of data
- Representational DQ:
- Interpretability
- Ease of understanding
- Representational consistency
- Concise representation
- Accessibility DQ:
- Accessibility
- Access Security
- Intrinsic DQ:
A Few Uses Include:
- Increasing the value of organizational data and the opportunities to use it
- Reducing risk and cost associated with poor-quality data
- Improving organizational efficiency and productivity
- Protecting and enhancing the organization’s reputation
- Data profiling (to establish trends and discover inconsistencies in the data)
- Data standardization (to ensure data uses the same, consistent format)
- Data monitoring (to alert data administrators when DQ thresholds are not met)
- Data parsing (to discover if the data conforms to recognizable patterns)
- Data cleansing
Image used under license from Shutterstock.com