Discovering the Keys to Solving for Data Quality Analysis in Streaming Time Series Datasets

Discovering the Keys to Solving for Data Quality Analysis in Streaming Time Series Datasets

Focus on Data Quality for High Value IIoT and IoT Business Outcomes

Data Quality for Time Series is Challenging

Ok, I’ll frankly admit that working with and analyzing time series datasets to achieve high value business outcomes can present substantial challenges and at times can be extremely frustrating.

What Issues Will I Run Into?

There are numerous potential sources for issues with acquired time series data. We will start by looking at four key problem dimensions of data quality — they are enumerated in the infographic below:

Manufacturing Use Case Example for Time Series Data Quality

To take a high level view of the problem, let’s assume that we are attempting to correlate data from a pricing system to a stream of data that is showing us that we have reduced output from one of our manufacturing lines.

The key issue here is how do you ensure that the data that is being received is not contrived, altered, or inaccurate, either through human error or sensor malfunction.

We will dive into methods for how to assess relative quality and begin to remediate it, however, it should be noted that when this high-level situation was posited, it was done in an order that should not be missed. The analysis type was given before the required data was noted.

Driving Data Quality Correction with a Use Case

Definition of Real Time

It is at this point that we need to define what real-time means. It has multiple meanings depending on the context, but in data flow it has a concrete definition:

Definition of “Good”

It is the assumption by many that automatically fixing data quality issues is the correct way to proceed; however, perceived fixes do not always improve data quality.

There is a high chance that a “Good” dataset that was automatically cleansed will RESULT IN OUTPUT ANALYSIS (THE END RESULT) being derived from predominantly manufactured data.

This is an example of the introduction of interpolation bias into the sampling system. Without knowing why that interpolation method was chosen or how the stride was chosen, the results will be altered based on that information.

Is There a Data Quality Issue?

A key step, after determining the analysis type that will be performed, is identifying that there is a data quality issue at all.

Distilling Quality Down to Key Dimensions

In order to effectively analyze the relative quality of a data set, distilling the potentials down to key issues is a key first step. For this, we will use the dimensions that were detailed in the introduction section:

Final Thoughts on Data Quality for Time Series

Data quality is not a simple IT task that has global rules and there is not a packaged application that can fix all data quality problems. Data quality is as much of a domain issue, as it is a technology one.

Your Feedback is Always Welcome!

Please let me know if you have any questions or comments as you look to drive business outcomes with higher quality time series data analysis and please reach out for any questions or use case discussions on our Tempus IIoT/IoT Framework — it provides a quick path for high value outcomes and rapid application creation when dealing with time series datasets.