Data Quality Dimensions: What are They? And How to Use Them

Without quality data, businesses cannot perform at their maximum potential, and risk having the competition beat them to it. Considering how valuable data can be, it’s vital to ensure proper data quality management. But, how is this measured?

With data quality dimensions

Data quality dimensions outline a set of 6 key attributes that measure the quality and usability of a dataset. These attributes are intended to help data managers and businesses assess, interpret, and improve the quality of their data.

This article discusses the 6 data quality dimensions in detail, explaining how to identify them, and how to meet better data quality standards.

What are the 6 data quality dimensions?

There are six dimensions of data quality, these include:

  1. accuracy
  2. completeness
  3. consistency
  4. timeliness
  5. validity
  6. uniqueness

Each of these data quality dimensions assesses a unique aspect of the data, testing how well the data is able to perform for its intended purpose. Here is a summary of how each of the 6 data quality dimensions measure data:

  1. Accuracy – to what extent does the data reflect reality?
  2. Completeness – Is there any key information missing from the data?
  3. Consistency – Do all the data values follow consistent formatting and correspond correctly?
  4. Timeliness – Is the data available when it is needed and expected?
  5. Validity – Is the data presented in the expected format? For instance, do emails have an ‘@’ symbol?
  6. Uniqueness – Are there any duplicate entries in the dataset?

Data quality dimensions explained

1. Accuracy

Data is accurate when it reflects reality. Data accuracy refers to having correct data entries, such as names, numbers, addresses, and other information. As an example, if a customer’s address information says they live at number 9, but they actually live at number 19, the data is inaccurate. 

It is very common for data to contain inaccuracies. 

Inaccuracies can exist for a number of reasons. For example, customers might input their data incorrectly when signing up through a website, with spelling errors, typos and other mistakes all possible in human error. Data inaccuracies may also be intentional in cases of customer identity fraud. Read more about how to prevent customer identity fraud.

Additionally, information can change over time. Customers might update their address, email or phone number, for example. This makes it very difficult to assess and monitor data accuracy, since information is ever-changing.

This data quality dimension should be monitored regularly, as it is perhaps the most likely to change over time. Monitoring data accuracy ensures that business decisions are made on reliable information, and that any changes are identified with the opportunity to correct them.

2. Completeness

Data completeness is achieved when all the required data is present in order for the dataset to fulfil its intended purpose. Completeness does not mean that 100% of all data fields have to be complete, it is about ensuring that the relevant, meaningful fields are complete with the right data for the job. 

For instance, running an email campaign would require a complete set of email data in order to return the maximum results. If there were several emails missing, the data would be incomplete.

Businesses often only ask customers to provide their first and last name, with middle names being an optional field. If customers provide their first and last name, but not middle name – the data is still complete as it is fit for purpose.

3. Consistency

Data consistency refers to how well the values and formatting in the dataset correspond with one another, and that these values do not conflict throughout the dataset. For instance, the formatting of postcodes should be consistent throughout. There are several ways that postcodes can be formatted, which makes them a good example when thinking about data consistency.

Postcodes could be formatted as ‘AB1 2CD’, ‘AB12CD’, ‘AB 12CD’ depending on how they are inputted. It’s useful to use a consistent format, for example, ensuring that the first block of characters refers to the locality of the address.

Adhering to this data quality dimension ensures that data can be linked to and from multiple sources, and that data tools interpret the information correctly. As a result, the data becomes more usable – saving time and costs.

4. Timeliness

Data timeliness refers to how easily the data is available as and when it is needed. This data quality dimension can have different meanings depending on the context of where the data is required. 

For instance, quick access to data in a hospital setting is critical, as the patient’s health could otherwise be at risk. An example of timely data in action can be seen in this data case study with Marie Curie, who were given quick access to COVID-19 data during the pandemic to make adaptive business decisions. 

In a less critical environment, such as quarterly business reporting, data timeliness would require less immediate access to information.

5. Validity

Data validity refers to how much the data conforms to the correct format, type, or range. Data must exist within the appropriate boundaries in order to be considered valid. For instance, a month should be between one and twelve, and anything else would be considered invalid.

Similarly, postcodes must appear in the Royal Mail postcode database in order to be valid as an existing postal address.

If data is not valid, it is not purposeful. It’s important to ensure validity with regular data cleansing schedules.

Data validity and data accuracy are similar, although are not to be confused as the same data quality dimension. Just because a data entry is valid, does not mean that it is accurate. For example, a customer could input a valid postcode that is not reflective of their real address.

6. Uniqueness

Data uniqueness is achieved when information in the dataset only appears once. This data quality dimension measures the extent of duplication. For example, data uniqueness would identify instances where there are multiple data entries for the same contact.

Even if certain fields are unique amongst two records for the same contact, this would still be considered duplicate data. For instance, a contact might be in the database twice, with two different email addresses. The chances are that only one of these addresses is accurate, making it important to ensure data uniqueness throughout.

Using data quality dimensions

Data quality dimensions can be used to identify data quality issues. They offer set criteria that lets businesses check the quality of their data and assess where there are areas for improvement.

Some data quality dimensions offer simple and easy measures, such as validity. It is usually fairly straightforward to assess whether data is presented in the correct format. 

However, some dimensions are trickier to assess. For example, accuracy can be difficult to measure, since it is sometimes impossible to know whether information is correct without getting in touch with the contact. For instance, it can be difficult to know if a phone number is outdated.

One way that businesses save time is to partner up with a data insight company, who can screen, validate and verify the data, as well as perform data enrichment to enhance it with any missing information.

We help businesses do exactly that, with a range of data quality services.

Data Quality Tools

Another way that businesses save time and effort is by using data quality tools to measure and identify issues. For example, data management platforms (DMPs) are used to help businesses manage their own data, auditing many of the data quality dimensions to improve quality. 

Online, our online data management platform, helps businesses audit and improve their data quality, as well as enhance it. Why not take a look at how your data measures up?

Get in contact to learn more about data quality