Data Integrity vs. Data Quality

By SIMBA Chain

In the digital world we live in, if your business relies on the collection, storage, and use of data (and it probably does) then keeping that data safe, secure, and accurate is of utmost importance to you and your business’ ability to survive and thrive. But when it comes to your data, knowing the difference between terms can really help communicate your data security needs. Two terms that are sometimes confused when it concerns your data are data integrity and data quality.

What is Data Quality?

Data quality is the completeness, timeliness, and consistency of information. We consider six aspects to be a part of data quality: completeness, uniqueness, validity, timeliness, accuracy, and consistency. If any one of these criteria are not properly met, it could compromise any data-driven initiatives you may have planned. How useful is the data for your business’ needs? Data quality seeks to answer this question.

What is Data Integrity?

Data integrity is a subset of data quality. It’s a simple concept to imagine, but much more difficult to execute. Simply put, data integrity is the validity of your data; it is the opposite of data corruption. Integrity is the overall validity and accuracy of the data. How trustworthy is your data? The answer lies in your data’s integrity.

Data integrity refers to the characteristics that determine the reliability of the information–if the data has been properly secured and is accurate and consistent. It is also the absence of unintended changes or modification to the data. In other words, data with integrity is not corrupt. After all, corrupt data is ineffective, unreliable, and spell trouble for the near- and long-term future of your company. Now, it is easy to confuse data integrity with data security. Security is a series of measures to keep data from being corrupted: it uses systems, processes, and procedures as preventative measures. Data integrity is more concerned with keeping the information intact and accurate for as long as it is needed: it implements a set of rules and processes which dictate how data is entered, stored, and transferred.

There are two varieties of data integrity: physical and logical validity. Physical integrity is the protection of the data’s completeness and accuracy of the data, but it can be compromised either by mistake or by something a bit more malicious. Logical integrity is further broken down into four types: entity integrity, referential integrity, domain integrity, and user-defined integrity.

United States Post Office A letter incorrectly addressed is an example of poor data integrity, where as an out of date address due to a move is an example of poor data quality.

Now, it may seem as if we’re splitting hairs in this explanation, so let’s look at one high-profile real world example of how data quality and integrity affected millions of people. In 1993, the USPS found that 23% of all mail was being incorrectly addressed. If the street number on the mail is off by one digit, the mail would not arrive at the right house–this is an example of poor data integrity because the data itself is inaccurate and invalid: therefore untrustworthy. But also consider the possibility that someone moves but hasn’t updated their address yet–this would be an example of poor data quality because the data lacks the needed timeliness and consistency to be accurate.

Data integrity is meant to ensure your data’s quality. In this way, the two concepts are related, but clearly not synonymous. Both of these concepts ask simple questions: does your data have quality or does it not? Does your data have integrity or does it not? Now, we think we know what you want the answer to both questions to be. After all, the role of data quality and integrity cannot be over-emphasized, so with more help ensuring the integrity and quality of your data, the SIMBA Chain team is ready to help.