Document
Mục lục
Data Quality Assessment
Purpose
Provides a systematic, business-driven approach to measure and evaluate data quality employing data quality dimensions, to ensure fitness for purpose and establish targets and thresholds for quality.
Introductory Notes
The business owns the data it creates and manages. No organization’s information technology staff can single-handedly improve the quality of its data. Business representatives across the patient demographic lifecycle must be engaged to: determine patient demographic data’s fitness for purpose across the lifecycle; define the level of quality desired; and define the level of quality acceptable.
The data quality assessment processes consist of making decisions about the data and acting on those decisions. Only those who create, modify, and delete patient demographic data, across every phase of its lifecycle, can decide:
- If the data set is sufficiently complete and accurate to support business process needs (i.e., “fit for purpose”);
- What is the desired state of specific attributes (i.e., “targets”);
- What is the minimum level of quality acceptable (i.e., “thresholds”);
- What are the best measures and metrics to track improvements.
To take an example of fitness for purpose, an organization may discover, based on data profiling results, that it does not capture a sufficient set of attributes to maximize the efficacy of its record matching algorithm. The data set is not fit for purpose because it is incomplete with respect to the business objective of preventing and reconciling duplicates. Business representatives would need to consider which attributes are most useful to add to constitute the minimum set that is required, for instance, mother’s maiden name, previous address, previous phone number, etc.
An example of targets might be: 100% population of all attributes (each specified) needed for matching. An example of a threshold might be: 95% of first names must contain more than one character (the rationale here might be that later in the patient lifecycle this could be modified). An example of a metric might be: the baseline profiling effort revealed that 10% of the street addresses did not have a street suffix (RD, ST, BLVD, etc.). This metric, percentage of records without street suffixes, could be monitored to assess improvement as data profiling monitoring and data cleansing activities were conducted over time.
The most effective mechanism to assist the business in assessing data quality and establishing useful targets, thresholds, and metrics is the consideration and application of data quality dimensions to each attribute. A “dimension” is a criterion against which data quality is measured. A number of different dimensions of quality can be measured. A sample set often used is presented below:
- Accessibility – the data is available when needed;
- Accuracy – affinity with original intent, veracity as compared to an authoritative source, correlation of data elements (e.g., an insurance card and a driver’s license), and measurement precision (e.g., a patient’s last name is correctly spelled);
- Completeness – availability of required data attributes (e.g., a ZIP code is missing);
- Coverage – availability of required data records (e.g., some returning patient’s records are missing);
- Conformity – alignment of content with required standards (e.g. birth date formatted as MMDDYYYY);
- Consistency – compliance with required patterns and uniformity rules (e.g., “Street” must be abbreviated to “ST”), supported by data entry standards, workflow management, and technical design standards;
- Integrity – accuracy of data relationships (parent and child linkage, e.g., patient is correctly represented as the mother of another patient);
- Timeliness – the currency of content (e.g., the patient’s name change is recorded as soon as it is known, and automatically updated across all relevant data stores); and
- Uniqueness – each record can be unambiguously identified (e.g., patient lookup and other reports provide a unique ID, versus Last Name, First Name, and Date of Birth) – uniqueness also includes checks for redundancy of records (e.g., duplicate patient records).
Performing a data quality assessment is based on the predefined quality expectations and criteria set by stakeholders and approved by governance. It is advisable to start by measuring data quality for a small set of key attributes supporting one or more primary business processes, i.e., patient demographic data. Profiling the data is the recommended first step. For each attribute identified, the organization should convene a working group (e.g., data stewards) representing all relevant stakeholders to determine targets, set thresholds, and define the quality dimensions that are most important.
Once the criteria are determined and the data evaluated, metrics can be developed and published in a scorecard or dashboard format. Assessment results facilitate root cause analysis and are key inputs into the organization’s data quality improvement plans. Periodic assessments should be conducted to determine if acceptable thresholds and targets are being met, and metrics should be updated accordingly.
To support these efforts and track improvements over time, it is helpful to conduct an impact analysis of the overall data quality effort, as well as specific impacts of improvements regarding individual data elements, as part of the assessment process. Categorizing impacts of poor data quality, such as cost, risk, compliance, productivity, etc. also assists in prioritizing data cleansing and quality improvement plans.
Effective goverance is important to implementing this process (See Data Governance). Assignment of specific responsibilities and data ownership deepens business engagement, which is important because improving data quality is truly a team effort. For example, an organization may decide that the Billing department should own ZIP code because it is critical for mailing patient bills; whereas, it is not critical for clinical care delivery. Under the supervision of the data quality coordinator, if ZIP codes were found to be missing or inaccurate, Billing would initiate root cause analysis and sponsor the resulting improvements for remediation and defect prevention.
The data quality assessment process and accompanying mechanisms and metrics provide the following benefits:
- Anchors responsibility in the departments for the quality of their data;
- Results in tangible improvements for each line of business;
- Deepens stakeholder knowledge of the data and refines quality rules;
- Creates a “quality culture” via a collaborative effort sustained over time;
- Proves that the organization’s data assets are becoming more trustworthy;
- Supports realistic cost estimates and better planning with an impact analysis; and
- Published data quality metrics:
- Inform internal and external consumers;
- Improve population health analytics, quality reporting, and other reporting such as Meaningful Use; and
- Foster interoperability.
Additional Information
When an organization establishes agreement on high-level objectives, such as assuring unique records in patient demographic data stores, it is better able to bring different perspectives into alignment around shared data assets.
Practically every step along the care continuum benefits from unambiguous patient records. However, the priority placed on the value of that uniqueness may vary, as well as the level of accuracy for the attributes used to ensure that uniqueness. Initial priorities often are driven by the primary purposes for which the data is collected. If the collective needs of all relevant suppliers and consumers are not addressed, there is a risk of negative downstream impacts.
For example, the accuracy of insurance information is very important for billing. Patient registration staff may consider that to be less important for the purpose of delivering quality care to the patient by registration staff. However, since patient registration also benefits from patient record uniqueness to meet their objectives for patient safety, they can be trained to be aware of the importance of capturing accurate patient and insurance information.
Example Work Products
- Data quality responsibilities
- Data quality criteria
Additional Information
The data quality assessment is the application of business-approved data quality requirements to a selected data set. Data quality requirements should be expressed in terms of data quality dimensions and should be aligned with organizational objectives. Targets and thresholds should be established for each dimension. Examples of quantitative documentation of targets and thresholds is illustrated in the following table:
Dimension
Definition
Threshold
Target
Accuracy
Affinity of data with original intent; veracity of the data as compared to an authoritative source; measurement precision.
85%
100%
Conformity
Alignment of data with the required standard.
75%
99.9%
Uniqueness
Unambiguous records in the data set.
80%
98%
Additional Information
Having a predetermined set of key attributes is essential to keeping the scope of data quality efforts manageable. Governance representatives should agree on the scope of attributes based on priorities that support the organization’s goals; for example, agreeing on a standard set of patient demographic attributes that will improve the ability to match duplicate patient records to support patient safety. While the optimum group of attributes is not unanimous across the industry, there is a minimum set on which many healthcare organizations agree:
- First/Given Name;
- Current Last/Family Name;
- Middle/Second Given Name;
- Date of Birth;
- Current Address (street address, city, state, ZIP code);
- Current Phone Number; and
- Gender.
Data quality assessments should be conducted periodically according to an approved frequency, per the data quality assessment policy.
Additional Information
Data quality assessments typically result in the implementation of data quality rules that are informed by business knowledge of the data. These rules are needed to properly handle data and define required data elements, formats, and timeliness parameters through either manual entry or automated ingestion from bulk data sources (e.g., .TXT, .CSV, .XLS, etc.). It is important to specify rules before data migrations, connections to external systems such as Health Information Exchanges and extractions to repositories for analytics. As assessments are conducted, rules are added and refined until the quality of the data set surpasses the required threshold expressed by the relevant stakeholders and until the quality approaches or reaches the targets.
It is important that high-level information in data quality assessment reports can be traced back to individual records to ensure that current thresholds and targets are accurately met. The organization should work with vendors to ensure availability and access to data, as well as timely and accurate reports.
Example Work Products
- Documentation of targets and thresholds
- Documented data quality dimensions and attributes
- Documented baseline measures and metrics
- Documented analysis of business and technical impacts
- Recommendations for remediation
Additional Information
A data quality assessment policy should be developed after creating the organization’s data quality plan (See Data Quality Planning). The policy should provide guidance on the selection of data sets, availability of data, alignment of data across systems, the types of methods and measures by which data quality should be assessed, the frequency and/or event triggers for conducting assessments, and the conditions for ensuring alignment with organizational objectives (i.e., governance sign-off) as well as compliance with the policy.
Example Work Products
- Published and accessible data quality rules
- Standard data quality assessment processes
- Standard reporting template
- An organization-level data quality assessment policy