Data quality management: What you need to know
How important is data quality management for big data?
Big data has and will continue to be a disrupting influence on businesses. Consider the massive volumes of streaming data from connected devices in the Internet of Things. Or numerous shipment tracking points that flood business servers and must be combed through for analysis. With all that big data comes bigger data quality management problems. These can be summed up in three main points.
Repurposing
These days there is a rampant repurposing of the same data sets in different contexts. This has the negative effect of giving the same data different meanings in different settings – and raising questions about data validity and consistency. You need good data quality to grasp these structured and unstructured big data sets.
Validating
When using the externally created data sets that are commonplace in big data, it can be hard to embed controls for validation. Correcting the errors will make the data inconsistent with its original source, but maintaining consistency can mean making some concessions on quality. This issue of balancing oversight with big data sets begs for data quality management features that can provide a solution.
Rejuvenation
Data rejuvenation extends the lifetime of historical information that previously may have been left in storage, but it also increases the need for validation and governance. New insights can be extracted from old data – but first, that data must be correctly integrated into newer data sets.
Where and when should data quality happen?
You can best observe data quality management in action through the lens of a modern day data problem. In real-life applications, different data problems require different latencies.
For example, there is a real-time need for data quality when you’re processing a credit card transaction. This could flag fraudulent purchases, aiding both customers and businesses. But if you’re updating loyalty cards and reward points for that same customer, you can do overnight processing for this less-pressing task. In both cases, you’re applying the principles of data quality management in the real world. At the same time, you are recognizing the needs of your customers and approaching the task in the most efficient and helpful way possible.