Methods of Quality Assessment and Assurance

A quality assurance program can have several purposes, each of which may be emphasized to varying degrees. In working toward its goals, a quality assurance program can try to prevent problems from occurring, detect and correct those problems that do occur, and encourage higher standards of care. It can attempt to remove or rehabilitate poor practitioners and providers, improve the average level of practice, reward excellence, or use some combination of those goals. The methods used in a quality assurance program may be as sharply focused as finding and reacting to isolated events involving a single patient and practitioner, such as a surgical mishap. They may be as broad as conducting continuing education, disseminating practice guidelines, initiating institution-wide “continuous improvement,” designing management information systems with uniform clinical data elements, and conducting research on effectiveness at a national level. Ideally, the choice of methods for a strategy for Medicare quality review and assurance should be based on an assessment of the burdens of harm from different quality problems (Chapter 7), an understanding of important features of health delivery systems that affect our current ability to measure care and effect change (Chapter 8), and the strengths and limitations of major methods of quality assessment.

Mục lục

INTRODUCTION

In this chapter we describe selected methods of quality assessment and assurance and discuss how well they meet the criteria for successful quality assurance efforts outlined in Chapter 2. (Chapter 6 in Volume II goes into more detail and differentiates methods by purpose, agent, and setting. It includes methods in use, methods derived from research studies, and methods described during site visits.) We have here focused on methods for preventing, detecting, and correcting quality problems for three settings of care: hospital-based care, office-based care, and home health care. Some approaches are directed at individuals; others are directed at institutions. Some are used primarily by health care organizations; others principally by external regulatory groups. Some have been developed for research projects; others have evolved in clinical and administrative departments in health care facilities.

Methods of preventing problems described in this chapter include accreditation and licensure for health care organizations and licensure and board certification for individual practitioners. Other methods include patient management guidelines and clinical reminders.

Approaches in detecting problems include analysis of administrative data bases, retrospective chart review, nonintrusive outcome measures, generic screening for adverse events using medical records, clinical indicators, and assessments of patient outcomes (such as health status and satisfaction). Detection methods based on aggregate data include the use of administrative data bases for analyzing outcomes such as mortality and complication rates. Individual-case sources of information about quality include autopsy, case conferences, and patient complaints.

Our discussion of methods of correcting problems emphasizes factors that are thought to impede or enhance the effectiveness of interventions intended to change behavior. Some interventions may be quite informal, for example telephone conversations with individual practitioners. Others, such as financial sanctions, are more formal. Interventions based on poor practice patterns include remedial education, restrictions on practice, and penalties.

The final section of this chapter reviews current thinking about the advantages and disadvantages of educational approaches, incentives (including rewards), and disincentives (including penalties) for individual physicians both for improving average and outlier practitioners. Because there has been little evaluation of methods of intervention, this section does not lend itself well to discussions of known strengths and limitations. Therefore, we confine our discussion to a description of factors and variables that are thought to influence ways of changing behavior in health care organizations.

Important Attributes of Methods

When considering the strengths and limitations of quality assurance methods, one should consider several features. Among these are reliability and validity. Reliability refers to consistency in results, that is the degree to which measures of quality agree either when repeated over time or when applied by different people or in different settings. To assess the reliability of a credentialing system, for instance, one might evaluate the consistency of information obtained about applicants for hospital privileges or how often review committees agree in their recommendations. To determine the reliability of a method to detect quality problems, one might calculate how often chart reviewers identify the same adverse events. Reliability in correcting problems is more theoretical, but one might envision measuring whether comparable corrective action plans (e.g., continuing education courses or reading designated literature) consistently improve tested knowledge.

Validity in this context refers to whether a method acts as intended. For one to consider board specialization a valid method of ensuring high quality, for instance, one would look for proof that those who are board certified provide a demonstrably higher quality of care than those who are not. Likewise, the validity of an outcome measure of quality could be assessed by determining whether patients with poor outcomes received deficient care and whether the deficiency produced the poor outcome. To demonstrate the validity of a method of correcting problems one would look for evidence that a specific intervention brought about the desired change. For instance, required consultation with a colleague before treating certain cases should result in fewer problem cases.

Assessment methods may be valid in that they detect real problems in quality, but even valid tools may be inefficient (if they detect a great many events that are not quality problems) or ineffective (if they fail to detect many important quality problems).

For virtually no method of assessment do we know the effect on provider behavior or the effect of practitioner change in behavior on patient outcomes. These are the ultimate tests of validity. Although methods may be accurate at identifying problems, they are valuable for quality assurance only if, or to the extent that, identification leads to changed behavior and to improved patient outcomes. Measures of these two demanding but critical factors are almost nonexistent, and this shortfall must temper any recommendations for specific approaches.

Assessment methods have important attributes other than reliability and validity. These include their practicality, ease of application, lack of unintended negative effects, inclusion of patient views and preferences, and ability to detect poor technical quality, overuse, and underuse. It is also useful to consider whether various methods of assessment provide timely information to improve performance and whether they yield information that accords with ideas about how professionals learn.

PREVENTION OF PROBLEMS

Accreditation and Licensure for Organizations

Hospitals, ambulatory care facilities, managed care organizations, and home health agencies can be accredited on a voluntary basis by the Joint Commission on Accreditation of Healthcare Organizations (Joint Commission). Approximately 77 percent of the approximately 7,000 Medicare participating hospitals have received Joint Commission accreditation. The remaining 1,600 hospitals that are not accredited are, for the most part, small rural institutions with 50 or fewer beds (see Chapter 5 in this volume and Chapter 7, Volume II for an extensive discussion of the evolution of the Joint Commission’s accreditation process).

The accreditation manuals for each type of facility are designed for hospital use in self-assessment and for the Joint Commission to use for on-site surveys. For hospitals in “substantial compliance” such a survey occurs every three years. Scheduled at least four weeks in advance, the survey is conducted by a physician, nurse, and administrative surveyor over a three-day period using explicit scoring guidelines. After a concluding educational exit interview, the facility may receive full accreditation or may be notified that accreditation is contingent on its carrying out a plan of correction. A hospital with contingencies may submit written evidence or may undergo a return site visit. It may then may be fully accredited or, in due course, nonaccredited.

In 1981, the Joint Commission replaced their prescriptive, structure-oriented standards and numerical audit requirements with a standard requiring ongoing, facility-wide monitoring of care. Monitoring was intended to permit the identification of problems and ways to improve the delivery of care and to promote solutions to any problems identified. Nevertheless, structural standards designed to prevent problems and to ensure the capacity of the hospital to operate safely are still in effect. Three such areas of emphasis include (1) a standard specifying that the governing body is to hold the medical staff responsible for establishing quality assurance mechanisms, (2) medical staff standards requiring regular review, evaluation and monitoring of the quality and appropriateness of services provided by the medical staff, and (3) a standard calling for the establishment of coordinated hospital-wide quality assurance activities.

In addition to the Joint Commission, accreditation for ambulatory facilities can also be sought on a voluntary basis from the Accreditation Association for Ambulatory Health Care (AAAHC), for HMOs from the National Committee on Quality Assurance (NCQA), and for home health agencies from the National League for Nursing (NLN) through its Community Health Accreditation Program. To date nonhospital providers have sought accreditation infrequently. These accrediting organizations, however, have become increasingly active, and some states, such as Pennsylvania and Kansas, have determined that these accrediting groups are acceptable to provide external review for HMOs.

Strengths

Standards for accreditation are publicly available. If the standards are unambiguous, and if reviewers are consistent in applying them, then information on accreditation status provides comparable information on health facilities. If accreditation standards were more widely accepted by external regulators (e.g., eligibility for third-party payers or state licensure boards), this might reduce overlapping requirements.

Because accreditation is conferred voluntarily by a body representing the kind of facilities being reviewed, it represents a quasi-internal process that is, at least in theory, responsive to member organizations, yet accountable to the industry as a whole and to the public. Depending on the perceived value of accreditation and the stringency of the review process, the organization may make substantial efforts to comply with standards. A variable rating system in accreditation could recognize outstanding performance.

Limitations

Accreditation is evidence that certain quality assurance efforts such as requiring specific credentials, staffing policies, or grievance procedures are being pursued. However, unless the accreditation process is itself evaluated and found to be based on reliable and valid methods, it cannot be relied on as a method of ensuring quality, and it may divert resources from more effective approaches. Accreditation can be very expensive and cumbersome, and this may discourage its voluntary use.

Credentials, Licensure, and Specialty Certification1

The examination of credentials is regularly used as a method of assuring high quality. The process is used (1) by state boards in granting licenses to practice, (2) by specialty and subspecialty boards in granting certification, (3) by hospital committees in reviewing applications to the medical staff, and (4) by payers in determining eligibility to be paid for services (Chassin et al., 1989a). The decisions of these groups may themselves constitute credentials. Licensure and board certification are particularly important.

Physician Licensure

Each state has statutes regulating the practice of medicine through physician licensure. Most of these laws define the practice of medicine and prohibit those who are unlicensed from engaging in it.

State medical practice acts are administered by state boards of medical examiners. Those who apply for licensure are judged on the basis of their education, postgraduate training, experience, results on licensing examinations, and moral character. Applicants for licensure must be graduates of schools of medicine or osteopathy that are accredited by the Liaison Committee on Medical Education, with special provisions being made for graduates of foreign medical schools. A postgraduate internship of one year is required by approximately three-quarters of the states, and applicants must successfully pass a licensing examination. All states currently use the Federation Licensing Examination (FLEX), prepared by the National Board of Medical Examiners (NBME) for the Federation of State Medical Boards. Most states will also accept the so-called National Boards, prepared by NBME or by the National Board of Examiners for Osteopathic Physicians and Surgeons. These examinations are administered in three stages as the student progresses through his or her education (Havighurst, 1988).

Some states have reciprocity agreements, whereby licenses granted by one state are recognized in another state. Some states require that applicants go through the procedures specified in their medical practice acts regardless of whether they are already licensed elsewhere (Havighurst, 1988).

Strengths. Licensure provides a minimum standard of quality for the individual health care practitioner. It does not meet any of the other goals of quality assurance listed in Chapter 2.

Limitations. The authority to practice medicine, once licensure has been obtained, is legally constrained only by criminal and medical malpractice law. Physician licensure is generally for life, and where licenses must be renewed, no new demonstration of competence is required. Many states have instituted certain continuing medical education (CME) requirements as a condition of license renewal. Attendance at approved CME is sufficient to meet the statutory mandate; those attending need not take and pass any examinations or show any other sign of accomplishment (Davis et al., 1984; Havighurst, 1988).

The physician’s license is also unlimited in scope, permitting the physician to engage in areas of practice for which he or she may have little training (Havighurst, 1988). This lack of limits stands in sharp contrast to the strict limitations placed on other health professionals subject to licensure. Licensure in no way guarantees competence across the wide range of medical practice or over time.

Specialty Certification and Recertification

The American Board of Medical Specialties (ABMS) recognizes 23 specialty boards that certify physicians as medical specialists in carefully delineated areas of practice. Several other entities also certify physicians, but because the ABMS system is so dominant, “board certification” is generally understood to mean certification in a medical specialty by a board recognized by ABMS (Havighurst and King, 1983).

For a board to achieve accreditation status, it must be sponsored both by a professional group, such as a specialty society, and the appropriate scientific section of the American Medical Association (AMA). All the boards are evaluated for recognition according to the ABMS “Essentials for Approval of Examining Boards in Medical Specialties.” Each board thus requires similar levels of training and experience.

The residency program must be approved by the Accreditation Council for Graduate Medical Education (ACGME), an organization composed of members of the ABMS, the AMA, and other concerned organizations. Together with appropriate specialty boards, the ACGME develops accreditation standards for each specialty residency program. These are regularly modified in conjunction with changing specialty board requirements and must be approved by the AMA’s Council on Medical Education (Havighurst and King, 1983). Ultimately, candidates must also pass comprehensive examinations administered by the specialty board.

Candidates for certification must receive and complete specialty training in an approved graduate medical program, the length and extent of which vary somewhat among the specialties. A majority of physicians in the United States identify themselves as specialists, but only about one-half are actually certified by an ABMS board. The number seeking certification has grown and continues to grow rapidly. Almost all physicians newly entering practice now seek some sort of certification. Of those who designate themselves as specialists, an increasing number are actually board certified.

Strengths. Certification in a medical specialty is widely accepted as an indication that certified physicians possess a superior level of training and skill in their area of specialization. Information on certification is readily available from such sources as county medical societies, the ABMS, AMA, American Medical Directory, and the AMA Physician Masterfile. Certification has been endorsed by the Joint Commission as an “excellent benchmark for the delineation of clinical privileges” (Joint Commission, 1989, p. 106).

Limitations. Ramsey and his co-workers (1989) compared the performance of board-certified and noncertified practitioners in internal medicine using measures of knowledge, judgment, communications skills, and humanistic qualities. Scores of board-certified internists on a written examination were significantly higher than those of noncertified internists, but ratings by professional associates, patient satisfaction scores, and performance in the care of common illnesses (as measured by medical record review) showed few differences. There were modest differences in preventive care and patient outcomes that favored the certified physicians.

The Office of Technology Assessment (OTA, 1988) reviewed 13 studies on the adequacy of physician specialization as a measure of quality and found little evidence that board certification accurately predicts high-quality care. Studies that use process criteria tend to show that specialists trained in their area of practice (the modal specialist) provide higher quality than those who have not been so trained, but this higher quality of process has not been linked to superior patient outcomes. Nor has a relationship been established between specialist care and patient satisfaction (Chassin et al., 1989a). Even if superior performance is associated with specialty training or board certification in one area, such evidence would not necessarily be generalizable to other specialties, diagnoses, or procedures (OTA, 1988).

In the past, boards granted certification for unlimited periods. There has been a move over the past 10 years toward recertification requirements, so that 15 of the 23 specialty boards have now adopted or decided to adopt time-limited certification with intervals between revaluations ranging from six to 10 years. One board offers voluntary recertification, and seven specialty boards have no recertification procedures (Havighurst and King, 1983).

Some experts have recommended that the certification and recertification processes should shift from one that is knowledge-based to a more “performance-based” assessment that reflects actual practice such as a review of a sample of records or observation. This, it is believed, will reflect more accurately the physician’s practice and thereby increase the validity of board certification (Havighurst and King, 1983).

Appropriateness and Patient Management Guidelines2

In medicine, and particularly in organized ambulatory care practices, guidelines serve many purposes, but they are intended primarily for education. They may specify appropriate and inappropriate uses of medical interventions, act as reminders for relatively simple tasks (e.g., provision of vaccinations), or serve as shorthand reminders for complex clinical decision making. For this last use they are sometimes called patient care algorithms. In all these applications, practice guidelines can help to forestall the occurrence of problems in patient care. In modified formats, they can also be used for retrospective quality review. Numerous groups, including medical specialty groups, have formulated such appropriateness and patient management guidelines. They are also frequently developed by interested clinicians within health care facilities and by health services researchers. They take on a variety of formats, depending on their highly individualized purpose.

Strengths

Patient management guidelines can be viewed as the translation of a medical text into a focused, often graphic, and sometimes computerized format. The use of branched reasoning and flow diagrams allows for great complexity and logically complete presentations. Well-constructed guidelines can allow patient preferences to be elicited and taken into account.

Limitations

Few (perhaps no) algorithms in use today are based entirely on scientific evidence of effectiveness. Generally, some or all of the available evidence is augmented by the clinical experience of the formulators. Many guidelines are nothing more than lists of ambiguous or vague statements about appropriate care that lack any guidance on their implementation.

Guidelines are frequently put into practice with no or only haphazard pretesting or evaluation. Often they lack provisions for updating or modification based on new knowledge, on their usefulness to clinicians, or on their impact on care. Guidelines may be of limited use for patients with multiple chronic conditions because the formats rapidly become too complex for easy reference.

Clinical Reminder Systems

Clinical reminder systems are computerized methods used in some managed care plans, clinics, and office practices to remind clinicians of preventive tests that should be performed, of laboratory monitoring that is due for patients with chronic disease, and of potential drug interactions (McDonald, 1976; Barnett et al., 1978, 1983; McDonald et al., 1984; Tierney et al., 1986). For instance, when printing out a list of scheduled patients a reminder system may use an age, sex, and risk-adjusted algorithm to specify screening tests or laboratory monitoring for individual patients. Other reminders may be used interactively to warn of possible drug interactions or to query the physician and advise on appropriate antibiotic prescriptions.

Strengths

A computerized reminder system can alert a practitioner to patient needs and potential problems at the time patient care is provided, making it a truly concurrent quality assurance system. Such systems can be tailored to individual risk factors and previous medical history. Clinical reminder systems can incorporate probabilities of various outcomes and references to journal citations for further information and can be updated frequently. Their value in improving clinical process has been well demonstrated.

Limitations

These systems require readily available computer equipment, a rapid response time, and enough practitioner familiarity with the software to be feasible for use during practice hours. Their relationship to improved patient outcomes remains unevaluated.

DETECTION OF PROBLEMS

Use of Large Administrative Data Sets3

Large data sets refer to claims-based administrative data bases such as those for Medicare Part A and Part B claims. Roos et al. (1989) distinguish three types of data bases and the kinds of studies that are feasible with each. A Level 1 data base contains only hospital discharge abstracts and will permit aggregate studies of, for instance, in-hospital mortality rates and lengths of stay, by geographic region or over time. A Level 2 data base contains, in addition, unique patient-identifying numbers. It can be used to study, for instance, short-term readmissions and volume and outcome relationships at a hospital-specific level. A Level 3 data base (the most comprehensive) will also have information from health program enrollment files, including when eligibility begins and ends. This data base permits the highest quality longitudinal studies, short- and long-term outcomes studies, and population-based (system-wide coverage) studies. Studies can include outcomes for intervention-free individuals and for poor outcomes or other complications that are not recorded as part of the hospital stay.

Weiner et al. (1989) have provided examples of quality-of-care indicators that might be developed from ambulatory care data bases. These include system measures such as the rate of hospitalizations, of readmissions, and “avoidable disease” or disease first diagnosed at an advanced stage. Other examples include (1) preventive-care indicators, such as the percentage of eligible persons receiving a recommended number of periodic screening tests or exams within a given time period and the documented incidence of newly diagnosed disease versus the expected incidence; (2) diagnostic indicators, such as the number or proportion of patients who receive unnecessary diagnostic tests or procedures; and (3) treatment indicators, such as the percentage of patients with a given diagnosis who receive the appropriate medication, the percentage of patients undergoing ambulatory surgical procedures who experience complications including hospitalizations, and the percentage of all visits to the patient’s primary provider.

Strengths

All large administrative data bases have several theoretical advantages for quality assessment. First, the accuracy of various types of data (e.g., medications, previous hospitalizations, and numbers of physician visits and medical conditions treated) is unaffected by errors in patient or practitioner recall. Second, the use of these data bases is unobtrusive; patient consent for individual studies is not required, and no bias is introduced from individuals’ knowing that they are being studied. Unobtrusiveness may also contribute to their acceptability to practitioners and health care facilities. Third, assessors can create and test different statistical models or approaches to risk adjustment. They can also alter study designs or use several different study designs to test findings; for instance, they can use both cohort and case-control designs to examine the effect of different intervention periods.

Fourth, the same files can be applied in different ways, for instance, tracking outcomes of surgery, computer modeling of readmission, examining changes in complication rates over time, or studying outcomes of care for patients in different geographic areas. Fifth, investigators can accurately assess risks as well as benefits associated with treatment, especially for areas of medical uncertainty. The data bases can provide inputs for clinical decision making by allowing calculation of the probability of complications of treatment or of mortality at varying lengths of time after treatment. Sixth, the use of administrative data bases is relatively inexpensive in comparison to methods that require large-scale primary data collection.

An important strength of Level 3 data bases is that they contain population data, and thus they permit some assessment of population access and outcomes. Comparative studies should be able to identify possible areas of underuse.

Limitations

Administrative data bases have considerable drawbacks for quality assessment. First, data bases may exclude important information such as certain events, information on location of service and provider, or costs, and may assemble the elements in ways that complicate linkage to other files.

Second, the precision of the coding schemes (primarily the ICD-9-CM4 and CPT systems) is of great concern, particularly for medical conditions that encompass a broad range of clinical severity and contain important clinical subgroups, such as congestive heart failure and diabetes mellitus. The ICD-9-CM coding system does not distinguish procedures performed on the right side of the body from those performed on the left. For this reason, a data base with ICD-9-CM codes will not allow a reviewer to determine whether a second hip replacement, for instance, is a reoperation or a new operation. Of equal concern is the poor ability of data bases to distinguish the order of events during a single episode of care (e.g., a pulmonary embolus that was present at the time of admission versus one that developed after surgery). Although administrative data bases record the occurrence of events such as x-rays and diagnostic tests, the results of these tests (whether positive or negative, or specific findings) are typically not recorded. New technologies or established technologies used in totally new ways may not be given codes for several years (PPRC, 1989).

Third, errors in recording and coding events can threaten the validity of the data. Some errors in recording are random, but some are systematic, especially if there are financial incentives for “upcoding” (systematic coding for services that are more intense or extensive and thus better reimbursed than the one actually provided) and “unbundling” (billing every component of a procedure separately) (PPRC, 1989). Depending on who is completing a form, and his or her incentives and training, the data will vary in accuracy. Diagnoses on hospital records are more likely to be accurate than diagnoses on outpatient-visit claims, although sometimes outpatient diagnoses can be grouped around a type of problem (e.g., gynecologic problems) to minimize this weakness.

Fourth, the data bases include only contacts with the health care system and of these, only contacts that generate a claim. A person who is ill but has no encounter with the health care system produces no record. Copayments and other barriers to access may accentuate this bias and lead to underestimates of poor outcomes.

Fifth, measuring the benefits of treatment is very difficult because positive outcome measures are not part of administrative data bases. Approximations may sometimes be attempted based on a decreased frequency of hospitalization or the length of intervention-free periods.

Sixth, analysis of large administrative data bases is a slow process. Even with major improvements in electronic data transfer and processing that are envisioned, it is not well-suited to rapid feedback of practice patterns.

Small Area Variations Analysis (SAVA)

SAVA is a way of using administrative data bases that has become a major area of research in its own right (Paul-Shaheen et al., 1987). Small area variations analysis can identify areas of high, average, and low rates of hospital services usage, but the methodology cannot discriminate appropriate from inappropriate care. As a problem-detection method, SAVA should be regarded as a screening methodology for alerting analysts about areas where quality problems may be occurring and for which more focused review may be needed. A strength of SAVA is that it can direct attention to potential areas of underuse as well as overuse.

Volume of Services (Individual or Organization)

After reviewing the literature on the possible relationship between volume of procedures done by institutions and the outcomes of those procedures, OTA (1988) concluded that good evidence exists that higher volume is associated with higher rates of good outcomes for a number of diagnoses and procedures. They cautioned, however, that the causal relationship is by no means clear, with controversy about whether higher volume permits the development of proficiency (e.g., in the surgeon or surgical team) or whether better practitioners attract a higher volume of patients. It is also not yet clear over what range of volume and under what circumstances the volume-outcome relationship holds.

Future Steps

Research using aggregate data has demonstrated their value for studying small area variations, length of stay, and variations in practice patterns and complications over time. Although work is underway to develop methods of risk adjustment, to improve linkages among data bases, and to validate and improve the accuracy of diagnosis and procedure codes, administrative data bases lack specificity in identifying quality problems for a given patient or for a particular episode of care. As a near-term strategy, these data bases are best suited to directing quality assessment efforts toward topics, populations, or providers requiring further study. Currently Medicare data bases do not include clinical data, measures of patient need, or outcome assessments. Efforts to devise a Uniform Needs Assessment instrument, to develop a Uniform Clinical Data Set, and to include patient functional status could greatly augment the value of administrative data bases for internal and external quality assurance programs (see also Chapter 6).

Retrospective Evaluation of Process of Care

Process studies review the provision of preventive, acute, and chronic care. Retrospective review of records using explicit criteria is the classic approach to assessing quality. Criteria and standards may be developed by a consensus of experts using their knowledge of the scientific literature and their clinical experience as guidance. Chapter 10 discusses issues in the development, validation, and evaluation of criteria for evaluating patient care.

Using an abstracting form developed for review, quality assessors cull information from the medical record and judge the quality of that care, usually against explicit process-of-care criteria. Sometimes the level of compliance with criteria is given a score; in other formats, care is simply rated as acceptable or unacceptable. Although some criteria sets are poorly constructed, others, such as patient management guidelines, may use branched criteria and an inclusive range of options in an attempt closely to approximate the clinical decision making process.

Palmer et al. (1984) and Greenfield (1989) have described the development of what are generally considered to be well-constructed algorithms for ambulatory patient care evaluation. They have been used to evaluate a range of medical situations such as compliance with preventive and well-child care, relatively simple interventions such as management following an abnormal Papanicolaou (Pap) smear, treatment of streptococcal sore throat or middle ear infection, and complex evaluation of patients presenting to the emergency room with chest pain (Greenfield et al., 1981).

The Committee on Practice Assessment of the Ontario Chapter, College of Family Physicians of Canada (CFPC) (Borgiel et al., 1985; Borgiel, 1988) conducted a pilot research effort during 1987 to develop a practical, economical, and acceptable method of practice assessment appropriate for use in office practice of family physicians. Its conceptual base was the notion of tracers (Kessner et al., 1973), in which general conclusions about care provided by the practitioner or facility are drawn on the basis of tracer (indicator, or representative) conditions and problems that are intensively studied. The CFPC computerized process evaluation focused on chart review for a set of tracer conditions to evaluate routine care for common ailments.

Although the study is still in a pilot phase, it provides a promising method of ambulatory office-based assessment. It also has potential for selecting doctors for participation in managed care organizations and for physician recertification (Chassin et al., 1989a). Moreover, the computerized algorithms developed for this study have continued to be adapted and extended. Some 280 screens cover about 85 percent of all primary care diagnoses, including condition-specific history and physical examinations, laboratory tests, therapies, and patient education (Michael McCoy, personal communication).

Strengths

Retrospective review of care using criteria developed by practitioners is likely to have face validity for professionals and for the public. Process criteria can address poor technical quality, overuse, and underuse because of undertreatment. Individual criteria can be evaluated for validity and reliability. Retrospective review can be used to identify outliers and to evaluate and provide specific information to improve the practices of outlier practitioners and raise the average level of performance. Information can document improvement in quality and can be used for comparisons over time and across sites of care.

Limitations

The development of criteria and standards requires evidence of efficacy or at least effectiveness. This evidence is often unavailable or contradictory. Even if available for certain patient populations, it is not available for every combination of patient risk factors (e.g., age, family history, or health habits) and other coexisting patient conditions, nor for all possible interventions and their combinations.

Retrospective review, which is commonly based on medical record review for reasons of cost, feasibility, and unobtrusiveness, is subject to well-known limitations of medical record review. The validity of recorded information such as patient history and physical exam findings must be assumed, but this may not be a legitimate assumption. Care provided may not be recorded; Gerbert et al. (1988) agreed with earlier researchers that the concordance among methods such as record review, videotaped observation, physician interview, and patient interview was not high. Interpersonal aspects of care, such as patient inability to follow a given medical regime or refusal of care, may also not be recorded. When care involves multiple practitioners with multiple medical charts, only a portion of that care may be retrieved for review. Even when data are recorded by practitioners or others they must be accurately retrieved from the medical record and accurately coded. This is a special challenge in ambulatory care, given the lack of uniformity in describing many ill-defined conditions.5 Chart review using explicit criteria followed by implicit review raises additional problems in reliability.

Outcome Data

General Points

Outcome data are attractive for quality assessment because they address the primary goals of health care. These include cure, repair of injured or dysfunctional organs, relief of pain or anxiety, rehabilitation of function, and prevention of or delay in the progression of chronic disease. Sources of outcome data include administrative data bases (e.g., deaths, complications of treatment, and readmissions), medical records (e.g., infections and return to the operating room), questionnaires and interviews about health status, and surveys of patient satisfaction.

To be valid as methods of quality assessment, approaches based on outcomes must direct users to areas of likely deficiency so that further study and appropriate interventions may occur at the institutional or subinstitutional level. At an institutional level, medical staff and administrators must be able to identify problem practitioners and decide what actions are needed to change a pattern of unsatisfactory outcomes.

Linking process and outcome depends to some degree on the timing of measurement. The closer an outcome measurement is to the time of medical intervention, the more likely it is that the outcome may be at least partly attributable to medical care rather than to some intervening event. For instance, reduction in blood pressure for patients with hypertension is a short-term outcome that might be helpful in assessing care of an individual patient or practitioner. Morbidity or mortality 10 or 20 years after diagnosis is a long-term outcome that would be more useful for comparing different populations or modes of therapy.

Validity is also affected by the quality of data recorded and retrieved and by the accuracy of patient (and “proxy”) reports on functional outcome status or satisfaction. Data must be adequately adjusted for factors other than medical or nursing care, for example other chronic conditions, severity of illness, and patient age. To assess the care provided by an individual practitioner, there must be a sufficient number of cases of any one diagnosis to provide statistically reliable data—a condition not often met except over a long time period, in specialty care, and for some conditions frequently treated by primary care physicians, such as hypertension and diabetes.

Hospital Mortality Rates

Strengths. Hospital-specific mortality rates are potentially useful nonintrusive screens for poor quality care. Death is obviously an important outcome of health care, and a substantial portion of hospital deaths is believed to be avoidable (OTA, 1988). Data are relatively easy to obtain; most hospital discharge abstracts and many claims systems have information on death.

The first public release by the Health Care Financing Administration (HCFA) of hospital-specific mortality elicited bitter accusations of inaccuracy and the potential for misunderstanding of data that were not adjusted for severity. Since then, considerable work has gone into the development of methods of adjustment; the model now includes such variables as hospital admission during the previous year and comorbid conditions.

Limitations. For hospital-specific mortality rates to be a valid screen for poor quality, hospitals with high mortality would have to be shown to provide poorer quality of care than hospitals with low death rates, as measured by an analysis of the process of care. Such a determination is limited by (1) unreliable diagnosis and procedure coding and (2) a lack of sufficient clinical detail to adjust adequately for the patients’ severity of illness at the time of admission (Chassin et al., 1989a, 1989b).

Hospital-specific mortality rates are further limited in their usefulness unless they use large numbers of hospitals, large numbers of patients from each hospital, comparisons over time to minimize the effect of chance variation, and adjustments for key hospital characteristics (Dubois, 1989). Only if all such information is complete and accurate can mortality data be adequately adjusted for severity and used as a screen for further review.

Medical Complications

Strengths. As with mortality rates, the use of complication rates as a measure of quality is attractive as a nonintrusive measure because it is believed that at least a portion of complications is preventable.

Limitations. The use of administrative data bases to identify complications that are the consequence of poor-quality care is hampered by the lack of accuracy of diagnostic and procedure coding, by the need for accurate and complete data, and by further variability and inconsistency in the recording of major complications (usually recorded as a secondary diagnosis). Methods have not yet been developed to distinguish complications ensuing from poor care from those occurring because of the degree of illness. For instance, cardiac arrest (a serious complication of heart attack) may occur because the heart is already severely damaged or because irregularities in the heart rhythm are not monitored and recognized (Chassin et al., 1989a). Another limitation, as with the use of mortality data, is the lack of sufficient clinical detail to adjust adequately for the patients’ severity of illness at the time of admission.

Generic Screening

Rutstein et al. (1976) first used the term “sentinel event” to describe adverse outcomes that can be closely linked with poor process of care. Each sentinel event is chosen because it is thought to have a high probability of indicating poor quality and therefore to warrant further review and possible intervention.

Generic screening is a method of identifying adverse, or sentinel, events by medical record review. Screens are “generic” in the sense that they apply broadly to the institution rather than to specific departments or diagnoses. Examples of generic screens are “unplanned repair or removal of organ,” “severe adverse drug reaction,” and “inpatient admission after outpatient surgery.” Events subject to screening include those in which patient harm occurs (such as ocular injury during anesthesia care) and events with the potential for harm (such as equipment malfunctions or patient falls).

Generic screening, now widespread in hospitals, is a two-stage system of medical chart screening by nurse reviewers, followed by implicit physician review. Data may be recorded on worksheets that are also used for admission, continued stay, and discharge review. Data are collected within a designated period after admission (e.g., 48 hours), at periodic intervals (e.g., every three days), and after discharge when all services provided have become part of the medical record. Individual events that meet certain explicit criteria (sometimes called screen failures or variations) are further reviewed by a physician advisor. Direct action is taken if a quality problem is confirmed and action directed toward an individual practitioner is appropriate. Data are later aggregated (e.g., by time, service, shift) to determine trends. PROs use generic screening as their primary method of chart review.

Strengths

Many adverse events (for instance, many nosocomial infections, especially surgical wound infection) are preventable (OTA, 1988). Characteristics of patients at high risk of such events have been identified (Larson et al., 1988). By focusing on an adverse event rather than a disease-specific process, generic screening can help to focus attention on interdisciplinary problems. The generic screening process is an appealing method for directing quality resources to serious problems of poor technical quality, overuse, and underuse (although underuse can only be detected for those already receiving care). All these features lend credibility to the general approach, although not necessarily to individual screening criteria.

Screening for adverse events is easy to implement. If it is done at frequent intervals and data are reviewed and collated promptly, screening for adverse events can result in immediate action. When potentially dangerous conditions exist, response can be timely enough to prevent further harm to an individual patient and to other patients exposed to similar risks. If data are retrieved by well-trained reviewers and combined with other tasks such as utilization review and discharge planning, screening supports coordination of care and efficient use of resources. Well-developed screening criteria sets could be generalizable to many sites and could provide benchmark data for comparison across sites and over time.

Limitations

Generic screening is inefficient in identifying quality problems. Reports of the percentage of cases that fail initial screens and must be further reviewed range from 14 to 30 percent (Craddick and Bader, 1983; Meyer et al., 1988; Hiatt et al., 1989). Only a fraction of those cases will be shown to have true quality problems.6

Some screen items are much less efficient than others. Inefficiency occurs because the criteria for determining a screen failure are often ambiguous. For instance, one criterion for the PRO generic screen that assesses medical stability at discharge is “abnormal results of diagnostic services which are not addressed or explained in the medical record.” Yet determination of what is properly considered abnormal and what is adequate attention to the abnormality varies with a patient’s condition, other medical problems, and severity of illness; this is difficult for a reviewer to determine without more specific guidance. Such inefficiency is costly in terms of resources and reviewer patience. Ambiguity of screening criteria is also likely to make them unreliable, which is another limitation.

Generic screens have not been well evaluated. The value of screening for adverse events depends on how well adverse events are recognized by medical practitioners, documented in the medical record, and then identified by reviewers. Screens may miss as many problems as they find (see Chapter 6). Some screen items are much less efficient than others. It is not known how often generic screens miss serious problems in quality that are also missed by risk-management programs because the screening instrument is insensitive to them, because they are not recognized by the reviewer, or because they do not become evident until after discharge. For instance, Hiatt et al. (1989) found a 7.9 percent false-negative rate attributable to reviewers not recognizing events that were recorded; only 3.4 percent of very severe adverse events identified by a risk management program were missed because they were not recorded at all in the medical chart. On the other hand, a GAO study of occurrence screening in the Department of Defense found that about 65 percent of occurrences were missed by hospital reviewers (GAO, 1989). The findings were attributed to (1) lack of sufficient guidance for reviewers, especially when more than one event was found in a patient’s medical record, (2) insufficient medical expertise by corpsmen reviewers, and (3) physicians screening their own records.

Because documentation is more extensive and adverse events more easily observed in the hospital during the longer period of observation, generic screening seems more suited to the hospital setting than to most ambulatory care where visits are brief and outcomes are unseen and unrecorded. The method might, however, also be suitable for long-term-care screening.

Generic screen data applied by internal quality assurance programs are most frequently reviewed long after the patient has been discharged; generic screening by PROs occurs six months or more after discharge. Thus, as most commonly used, they are not helpful for concurrent intervention. Instead, their value for patient care thus depends on dissemination of data on patterns of problems. The study committee was unable to assemble evidence, however, that this dissemination occurs routinely in hospitals.

Clinical Indicators

Clinical indicators of care can refer to several quite different things. They can refer to adverse events or to measures of process recorded routinely by clinical care and ancillary departments. They can also be written screens of acceptable practice that are objective, measurable, and applied consistently to the review of care by nonphysician reviewers (see O’Leary, 1988; Lehmann, 1989). Finally, they can be appropriateness protocols (based on adherence to condition- or procedure-specific standards) or be positive or negative health status outcomes.

The Joint Commission distinguishes sentinel events and comparative indicators. Sentinel events are serious complications or outcomes that should always trigger a more intensified review, such as maternal death or craniotomy more than 24 hours after emergency room admission. Comparative indicators establish rates over time or rates in comparison to other institutions. A particularly high or low rate may trigger further review, for example, the rate of death after coronary artery bypass graft surgery, the rate of wound infections, and the rate of vaginal births after cesarean delivery.

Strengths and Limitations

By and large, the same advantages and drawbacks to generic screening and retrospective review of the process of care apply to clinical indicators. A possible advantage of clinical indicators over standard generic screens is a presumed higher face validity for physicians and other practitioners. A possible drawback is their relative newness.

Patient Reports and Ratings

Patient reports refer broadly to interviews and surveys of patients that are conducted either at the time care is provided or later, by telephone or by mail. Surveys can include potential patients, for example Medicare beneficiaries or HMO members who have not used care. Interviews and surveys may ask patients to report on the process of care (both technical and interpersonal) and its outcome and to rate the quality of the care they received and their satisfaction with it.

Strengths

Surveys can investigate such aspects of patient experience as access to care, amenities of care, interpersonal and technical aspects of care, health status, understanding of instructions, experience in comparison to expectations (including a judgment of outstanding as well as poor care), and unmet needs. Detailed satisfaction surveys are fielded by many HMOs and, increasingly, by hospitals. In addition to compiling assessments of care received in primary care facilities, some surveys also include questions about care provided by specialists and affiliated hospitals.

Patient assessments are commonly sought internally by organizations (although they are not necessarily fielded by or used by the quality assurance program), and only rarely by external groups. Patient reports can provide information about (1) underuse (such as perceived lack of access, underdiagnosis, or undertreatment), (2) interpersonal aspects of care, and (3) expectations and preferences. Most problem detection methods do not tap these aspects of quality. Patient surveys that provide for free responses (for example, asking if the respondent has comments to make) can identify unexpected problems and elicit useful suggestions. Satisfaction questionnaires that are sensitive to specific elements of care and to change over time can be a valuable way of documenting improvement and excellence. Survey results can be used to compare sites if data are properly adjusted for differences in populations and expectations.

Recently a great deal of work has gone into the development of valid and reliable patient assessment instruments (Davies and Ware, 1988). The increasing availability of such instruments may bring a degree of standardization of methods and instruments to the health care field for use by the Medicare program as well as by internal quality assurance programs.

Limitations

Patient reports are prone to the usual sources of error in survey methodology, such as bias due to nonresponse by certain population subgroups and errors in recall. General questions inquiring about satisfaction typically result in overstated patient assessments in comparison to specific questions. The ability or inability of patients to judge technical quality of care is also a problem as is accounting for the effect of illness and the effect of the health care environment on patients’ assessments. Surveys have the potential, as well, to disrupt the doctor-patient relationship. Like other assessment methods, patient assessments must be able to adjust for differences in access, patient characteristics, and expectations if assessments are to be used for comparisons.

Survey data have not in the past generally been accorded high priority as sources of information about quality or as forces for change within health care organizations. Often they were poorly constructed and implemented without pretesting. In many settings, patient surveys are fielded by marketing departments and have little or no linkage to quality assurance efforts; typically there is no feedback of data on patient satisfaction to practitioners, administrators, policymaking groups, or governing boards. More recently, however, proponents of continuous improvement have placed increased emphasis on knowing the needs of those served, whether patients or other “customers.”

If such information were to be collected and used by purchasers of health care, incentives to change processes in areas of dissatisfaction would probably be reinforced. The effect on quality of care and on health, however, would depend on whether areas that are changed are “amenities” of care or important elements of effective health care.

Health Status Assessment

Outcomes of care are the ultimate criteria for judging the quality of care and thus have great face validity for both patients and caregivers. Outcome measures include disease-specific clinical endpoints of care such as physiologic outcomes, a broad set of generic measures of functional and emotional status, and measures of well-being (see Chapter 2).

Strengths

Outcomes, such as patient health status measured at some transitional point in care, can help to evaluate preceding care in another setting such as at the time of admission to the hospital or admission to home health care. Similarly, periodic health status measurement can provide information about changes in status compared with expected status.

Limitations

Comparing observed health with expected health requires empirical data on the natural history of illness and on the effects of treatment. Such information is still lacking for many conditions. More important, linking lower-than-expected health status to a deficient and identifiable element of care is often difficult and limits the value of outcome-based methods for quality assessment. This variant of the “process-outcomes” link problem lies at the heart of difficulties with outcome measurement as a quality assurance tool.

Health outcome measures appropriate for office practices, such as physical and emotional functioning, are not in wide supply, although they are available (Nelson and Berwick, 1989). The Medical Outcomes Study (MOS) (Tarlov et al., 1989; Stewart et al., 1989) has shown promising interim results using the MOS Short Form (Stewart et al., 1988), a generic measure of functional status. The MOS Short Form has been used to demonstrate distinct functional profiles for patients with nine different chronic diseases (e.g., hypertension, coronary heart disease, diabetes, and depression) and might prove useful as benchmarks for evaluating care. An innovative set of visual charts, called COOP charts, tap areas of physical, mental, role, and social functioning and is also being tested for use in ambulatory practice (Nelson et al., 1987).

Individual-Case Methods

Several methods of case-by-case problem detection, such as autopsy and case conferences, have been developed and implemented in health care settings. Other approaches have administrative or even legal purposes, such as patient complaint and incident-reporting systems. Still others might be considered monitoring devices to identify poor practitioners after lengthy external processes. These include PRO sanctions, disciplinary actions by state medical boards, and malpractice settlements.

Two general problems limit the value of case-by-case systems as problem-detection methods. First, these systems have generally not been aggregated nor findings classified consistently so that patterns could be identified. Second, these systems are usually not linked to quality assurance efforts directly or indirectly through a common reporting pathway (for example, a hospital governing board). As a result, they do not contribute to the analysis of patient problems or integrated feedback of information to practitioners (see Nelson, 1976). The specific strengths and weaknesses of three methods, autopsy, case conferences, and patient complaints, are discussed below.

Autopsy

Strengths. Unexpected findings at autopsy are considered to be an excellent way to refine clinical judgment and identify possible misdiagnosis. Landefeld and Goldman (1989) summarized the value of autopsies. In 5 to 10 percent of cases “treatable, major unexpected findings have been discovered that, if known premortem, would probably have improved the patient’s chance of survival. Other major unexpected findings were revealed in another 10 percent of cases” (Landefeld and Goldman, 1989, p. 42). Autopsies can provide information on the rates of and reasons for discrepancies between clinical diagnoses and postmortem findings.

Limitations. Several aspects of autopsy have limited its usefulness for quality assessment. There are no standard methods for classifying unexpected autopsy findings nor any formal system of feedback from pathologists to quality assurance programs. Autopsy reports may lag death by one to three months. Moreover, the proportion of hospital deaths that are accompanied by autopsy has declined greatly in recent years (from 50 percent in the 1940s to 14 percent in 1985) (Geller, 1983; MMWR, 1988). The decline in autopsy rates has been attributed to lack of insurance reimbursement and to practitioner reluctance to request permission for autopsy from the patient’s family. Landefeld and Goldman (1989) have suggested several strategies for increasing the rate of autopsies, including requesting permission for possible autopsy at the time of admission and reimbursement by HCFA and other third-party payers; they believe reimbursement is justified on the basis of its value for quality assurance.

Case Conferences

Case conferences are primarily educational meetings in which physicians review the care of difficult cases. The case may be presented because it was unusual or complex, required difficult clinical management choices, or had an adverse outcome. The discussion may cover a great many topics such as the value of new technologies, approaches to care that might have been more conservative, clinical findings that were overlooked, or an ethical dilemma presented by the case.

The Morbidity and Mortality (M&M) conference is a department-based conference that occurs after an adverse event such as a death or complication, typically after a surgical procedure. The course of illness and diagnostic, autopsy, and pathology findings are presented and discussed by the attending physician and pathologist.

Strengths. Case conferences are highly valued by clinicians as an effective method of learning. They are conducted in a nonjudgmental atmosphere and are considered clinically pertinent. They accord with medical training in that they focus on individual cases.

Limitations. Case conferences are believed to be very effective in monitoring and assuring high quality of care in hospitals. They do not, however, result from or lead to systematic information about practice patterns and outcomes that might advance the institutions’ understanding of patterns of care in unanticipated ways.

Patient Complaints

Reviewing complaints can be a method of detecting problems in care. Responding to complaints may have two valuable functions. It indicates to patients that the organization takes problems seriously, and it may prompt intraorganizational reforms that would never be suggested by formal quality assurance mechanisms. Complaint reporting programs are also used by external regulators, such as state and local departments of health and insurance commissioners. At least one PRO visited in this study believed that patient (or other) complaints were useful in identifying problem practitioners and that PRO review of patient complaints helped foster better relations with the patient community.

Strengths. Review of complaints, like patient assessments, includes patients in the quality review process and permits the identification of unexpected problems. A systematic classification and review of complaints has the potential to identify underservice, including lack of access to services. It can also identify interpersonal issues; like malpractice allegations, complaints about health practitioners are probably less likely when the interpersonal process has been good.

Limitations. The value of using rates of complaints for detecting quality problems depends on the relationship between the rate of complaints and the rate of quality problems, about which virtually nothing is known. Nor is it known whether serious problems in care are more likely than trivial problems to result in complaints. According to the New York State Department of Health (personal communication), only a small percentage of complaints received are confirmed as quality problems. In short, patient complaints may be highly idiosyncratic, so that patterns of complaints may be very difficult to detect or interpret.

The degree of patient vulnerability probably affects the likelihood of lodging a complaint. For instance, healthy HMO patients may not hesitate to register complaints about overlong waits. By contrast, elderly, frail, and isolated patients receiving home health care may be more reluctant to complain about home health aides on whom they are dependent, yet the danger to the patient’s health in the latter situation may be far more grave.

FACTORS THAT IMPEDE OR ENHANCE THE EFFECTIVENESS OF QUALITY INTERVENTIONS TO CORRECT PROBLEMS

It is difficult to overstate, although not difficult to understand, our lack of knowledge about useful strategies for changing professional and organizational behavior. If the science of quality assessment is considered to be in its infancy, then we must regard our knowledge of strategies for quality assurance to be embryonic. Arguably, we know more about how to change patient behavior (e.g., see Haynes et al., 1979) than how to change the behavior of health professionals and organizations.

The study committee asked Avedis Donabedian, an eminent observer and writer in the field of quality assurance, to reflect on barriers to effective quality assurance. In preparing this discussion, we have drawn heavily on the paper he prepared in response to that request (Donabedian, 1989). The remainder of this section considers the key attributes of the medical profession that should be taken into account in designing a quality assurance effort. It also considers important aspects of changing practice behaviors of both individuals and organizations in response to quality assurance findings.

Special Characteristics of the Medical Profession

Donabedian (1989) argues that to appreciate the factors that promote or hinder quality assurance and to evaluate methods commonly used to change individual behavior, one must understand several special characteristics of the medical profession. These include professional autonomy and accountability, training and socialization of physicians, traditions of informal peer review, and unfamiliarity with quality assurance as a formal process.

Autonomy and Accountability

The tension between autonomy in the practice of medicine and accountability for its quality is a hallmark of this profession. In granting the medical profession primary responsibility for quality, society has recognized the special expertise required to determine what constitutes goodness in technical care and has insulated physicians from interference by outside interests that might subvert clinical judgment. At the same time, it has expected a reasonable degree of public accountability. These are principles that the medical profession has espoused and to which society has largely adhered.

The desires of the medical profession to define quality and to control the means for assuring it have been recognized by delegating the monitoring function outside hospitals to organizations controlled by or responsive to physicians (such as the Medicare PROs). Within hospitals, the organized medical staff is entrusted with that responsibility. The medical profession also controls the criteria and standards by which quality of care is to be judged.

Opinions differ as to which societal requirements constitute interference with professional prerogatives and which are legitimate demands for accountability. In this tension between accountability and professional autonomy, one finds the origin of much that troubles quality assurance efforts today.

Socialization and Peer Relations

Two related characteristics of medicine are (1) the emphasis placed on recruitment, training, and socialization and (2) the significance accorded to informal rather than formal quality assurance interventions carried out by fellow clinicians. Both mechanisms are intended to produce professionals who are both technically competent and morally equipped to be self-critical and self-correcting. In professional training and in later practice, monitoring individual performance has been informal rather than formal. Individual conduct has been regulated indirectly through inclusion in or exclusion from the network of professional referrals and by other, more or less subtle indicators of professional approval.

Medical professionals depend economically and to some extent emotionally on one another. The careers of physicians depend on the approval of colleagues who vouch for their competence by sending them patients. Colleagues also offer encouragement and support for what may be regarded as an intrinsically uncertain and hazardous profession, not only because of the substantial likelihood of mistakes, but also because of the perceived litigiousness of the public and the unsympathetic scrutiny of external monitors. Quality monitoring, when it progresses to the point of identifying individual practitioners directly or by implication, requires that some physicians sit in judgment over others. Even though the participants in this monitoring may be legally protected from reprisal, they are subject to other powerful modifying motivations.

Distrust of Externally Imposed Efforts

The medical profession would like to see the traditions of professionalism and informal peer review preserved and incorporated into quality assurance efforts. This stance is very different from recent developments often feared and opposed by physicians. Although quality monitoring in hospitals and PRO review are under physician control, physicians are warned that if they will not do the job, others can be found who will. Moreover, what physician-directed review agencies do is more and more externally prescribed, often in painful detail. What physicians do and accomplish is subject to external verification.

To many physicians, the objectives of the monitoring enterprise are often suspect. They have reason to believe that much of what is done in the name of quality is, in fact, cost control.

Also alien (and alienating) is the insistence by external controllers that monitoring extend to the identification of individual malfeasance, leading to disciplinary action. Some physicians claim that the “body count” (meaning the number of physicians censured) has become the measure of success in federal performance monitoring. In some ways, federal agencies have bypassed the medical profession altogether, for instance, by releasing to the public information such as hospital-specific mortality rates that could create mistrust and discord between physicians and patients. In such circumstances, one can expect monitoring to be resisted and, when possible, weakened, perhaps to the point of nullification.

Unfamiliarity with Quality Assurance Purposes and Methods

Physicians are not generally familiar with methods of quality assessment and have only recently moved into clinical management in any numbers. They are trained to be concerned with the care of individual patients, not with patterns of care. Epidemiological and statistical skills are recent additions to medical training. Further, few trainees have participated in quality assessment.

This lack of familiarity makes leadership in quality assurance efforts both scarce and crucial, and it points to the need for educational reform in building the capability of professionals to act with confidence. We return to this point in Chapter 11.

Changing Practitioner Behavior

Changing professional behavior in the long run requires persuading the professional of the need to change. The most persuasive data are those that are credible, complete, timely, and pertinent to an individual’s practice.

Educational Approaches

Good education and training are regarded by the health care professions to be the foundation for good practice. Beyond a lengthy and rigorous initial experience, a lifetime characterized by continuous learning is the ideal. Professionals hold educational interventions to be the preferred method for obtaining behavior change, especially if the method does not single out individuals.

Although education may be the most useful first approach to changing professionals’ behavior, the kinds of problems or practitioners that are most amenable to educational interventions are not clear. Respected clinicians providing feedback in relatively informal settings may be the most effective agents for change. Lohr et al. (1981, p. vii), in referring to technology diffusion, state that “in general, professional colleagues are considered more potent legitimizing agents than any other single influence, and the most effective force for physicians’ adoption of medical innovations is professional, face-to-face contact with recognized peers.” The same can also be said for adoption of new practice behaviors (Eisenberg, 1986; Schroeder, 1987; Davidoff et al., 1989).

After reviewing the literature, Eisenberg (1986) reported that some studies show continuing education to be effective, others show it to be ineffective, and many other studies are inconclusive because of deficiencies in their methods or ambiguities in their findings. Guided by this picture and principles of adult education, he surmised that successful approaches to modifying physician behavior should have the following features. First, a practitioner should have accepted (presumably on the basis of valid evidence) that he or she has a need to learn. Second, the educational content should be specific to the need already identified. Third, education should be conducted face-to-face. Fourth, if possible, it should be conducted one-to-one. Fifth, it should be conducted by an “influential” person—a person the practitioner trusts and respects. Presumably, many educational efforts fail because one or more of these conditions are not met.

Education is called for when insufficient knowledge or skill are at least in part the reason for deficient care. How often ignorance and ineptitude are a cause of poor care, and to what degree, is not known, but their contribution can be expected to vary considerably from setting to setting. McDonald and his co-workers (McDonald, 1976; McDonald et al., 1984; Tierney et al., 1986) concluded after a controlled study that errors in ambulatory care occurred more often because of overload of tasks and information than because of lack of knowledge. To the extent that mistakes are caused by lack of access to current knowledge, online computer-aided management, warning, and reminder systems may hold promise for affecting physician behavior. We do not, however, know a great deal about the circumstances in which these systems are used or are useful, and we recognize that available tools and attitudes may be changing dramatically as the professionally trained population becomes increasingly computer literate.

The conditions detailed above underscore the importance of establishing a clearly defined link between quality monitoring and continuing education, as typified by the “bi-cycle” model proposed many years ago by Brown and Uhl (1970) and repeatedly advanced since.7 Donabedian (1989) believes that the relative effectiveness of alternative ways of linking monitoring to education, and of conducting the educational effort itself, should be high on the agenda of research on the effectiveness of quality assurance through monitoring.

Incentives and Disincentives

Professional behavior can also be changed by directive. The net effect of such an approach on the health of patients and the morale of professionals has not been explored. Likewise, little is known about the effects of positive versus negative incentives or ways to link informal professional incentives to quality assurance activities.

Feedback and education are meant to appeal to internalized values and to mobilize the personal resources of practitioners. Various factors in the environment may well enhance or diminish these efforts. Much depends on the implicit expectations and informal understandings of medical colleagues, but a great deal may also depend on the structure of a more formalized system of rewards and penalties. The relative impotence of quality assurance efforts in directly modifying practitioner behavior may be attributable to the absence of a clearly defined, consistently operative link between the results of monitoring and the career prospects of practitioners. Thus, the formal system of incentives and disincentives deserves particular attention in any analysis of effectiveness.

Incentives are commonly regarded as rewards and disincentives as penalties, but not receiving a reward when a system of rewards has been instituted can be a disincentive, and not being penalized when a system of penalties has been established can be an incentive. A system based on recognizing and rewarding, rather than ferreting out and punishing error, would very likely differ in its acceptability to professionals and perhaps also in its effectiveness. One might also hypothesize that a system having features of both approaches could be the most effective.

Rewards and penalties might also be distinguishable by whether they are professional, financial, or both, and by whether they are generalized or particularized. For example, a promotion connotes both professional and financial rewards that are not necessarily related to any particular meritorious action; rather, a general pattern of laudable behavior is being recognized. By contrast, withholding payment for an unapproved procedure is a particularized penalty. A proposal that physicians be awarded part of the savings that accrue from their maintaining a lower-than-average length of hospital stay is an positive financial reward related to a pattern of behavior. The traditional incentives of the professional culture such as career advancement, salary, risk-sharing and bonus arrangements, and esteem of colleagues are more individualized.

The magnitude of the rewards or penalties might affect their impact. This may be especially true if penalties are matched to the seriousness of the offense (Vladeck, 1988), to the credibility and legitimacy of the judgments that lead to them, to the presence of procedural and legal safeguards against arbitrary action, and to evidence that penalties are used fairly and consistently.

Eisenberg (1986) reached two conclusions about the use of penalties. First, penalties do modify physician behaviors. Second, they are deeply resented and may have unexpected or unwanted consequences—a “backlash,” as he calls it. Such backlash undoubtedly would create political and administrative problems, but its effect on the quality of care is not clear.

Changing Systems and Organizations

Berwick (1989) has stated that “flaws come more often from impaired systems than from impaired people.” Many quality assurance professionals concurred with this view during our site visits, and this viewpoint has led many to focus not only on average practice rather than outliers but on organizational factors that affect quality and on ways of changing them.

Despite a voluminous literature on planned organizational change (Johnson, 1989), its principles have not been extensively applied to the health care organization. In particular, the implementation of continuous improvement models (discussed in Chapter 2) has not yet progressed far enough to demonstrate their effectiveness in modifying clinical and organizational behavior.

In a discussion of barriers limiting implementation of quality assurance programs, Luke and Boss (1981, p. 148) stated that the ineffectiveness of educational strategies results from a “failure to conceptualize quality assurance primarily as a problem of organizational and behavioral change.” They further asserted that “…the real barriers to quality assurance are not the impediments to data acquisition and analysis but the points of resistance to change within health institutions.” They identified 10 barriers to change that must be recognized and addressed if interventions are to be effective. These barriers are: (1) autonomy expectations of health professionals; (2) collective benefits of stability; (3) calculated opposition to change; (4) programmed behavior; (5) tunnel vision; (6) resource limitations; (7) sunk costs; (8) accumulations of official constraints on behavior; (9) unofficial and unplanned constraints on behavior; and (10) interorganizational agreements.

Many organizational factors may influence the effectiveness of quality assurance efforts. Particularly important may be the collaborative nature of medical practice and its dependence on institutional (mainly hospital) support (Knaus et al., 1986). Physicians, as a group, may be able to control only a part of what is done for patients; any one physician can control even less. Palmer et al. (1985) found that physicians in ambulatory practices were more likely to improve their care in response to failures revealed by monitoring when the change to be made was more directly under their own control.

Organizational Factors Influencing the Form and Effectiveness of Quality Assurance

Quality monitoring is most likely to occur when care is provided in or through institutions or organized programs. Thus, the forms it takes as well as its effectiveness can be expected to reflect the characteristics of these organizations.

Variations in quality among institutions are probably at least partly attributable to the fact some institutions have better developed and functioning quality monitoring systems than others. When care is made more “visible” to colleagues (for example, through sharing responsibility for care, consultation, teaching rounds, clinical conferences, and the like), then the quality of that care is likely to be higher (Neuhauser, 1971; Shortell et al., 1976). Other advantages in quality have been attributed to controls over recruitment and staffing, to equipment and material resources, to direct supervision of professional work, and to more subtle attributes such as coordination, communication, and tightness of organizational control (Georgopoulos and Mann, 1962; Scott et al., 1976; Flood and Scott, 1978). The role of formal monitoring mechanisms, as a separable organizational feature, in influencing the quality of care provided is as yet unexplored.

Several features of the organizational environment might influence the implementation and effectiveness of quality monitoring. These include ideology, leadership, and baseline performance.

Ideology

The importance accorded to quality, both in absolute terms and relative to competing objectives (particularly cost containment), may be an important determinant of the effectiveness of quality monitoring. The sources of concern for quality may derive from the perception of a social responsibility, a professional imperative, a prudent yielding to coercion, or the prospect of a profitable response to market forces. All these factors may also bear on the effectiveness of quality assurance, particularly if all motivations impel in the same direction (Donabedian, 1989).

The relative importance given to technical care as compared to the interpersonal process may also influence the form and success of quality assurance. To some extent this choice is ideological; it reflects or is influenced by the views of the organization’s leadership, the values and traditions of the major constituencies to which the organization is answerable, and the functions an organization serves. For instance, the quality of technical care is likely to be the dominant concern of a major teaching center. In contrast, a long-term care facility under religious auspices is impelled, in part, by the values of its sponsors to emphasize the amenities and the interpersonal aspects of care. In the first instance, the interpersonal process may be at risk, whereas in the second situation technical care may be in jeopardy.

Leadership

Leadership as a component of the organizational environment is least amenable to control and often dependent on serendipity. Donabedian (1989) points out that whether leadership is provided by a member of the governing board or a senior administrator, he or she must be a trusted and respected colleague who is directly involved in the program. Although the evidence is weak, it seems to suggest that quality monitoring is more effective in altering physician behavior when clinical leaders participate in it and alter their own behavior in response to its findings (Palmer et al., 1988).

Baseline Performance

The baseline level of clinical performance that characterizes an organization may be an important determinant of the perceived need for quality monitoring and may affect both the design of the monitoring enterprise and its effectiveness. In this regard, the shared perceptions of the level of performance may be as important as the actual level, more objectively assessed.

When the actual or perceived level of performance is exceedingly high, formal monitoring may seem redundant. When such review is externally imposed, it may be resented and at best perfunctorily performed. Major teaching institutions may be particularly prone to these behaviors.

When actual performance is at an uncommonly low level, quality monitoring may be regarded as a threat. Where monitoring is introduced, it may be ineffective because poor practice is usually the consequence of deepseated organizational pathology. Disbelief, defensiveness, and low expectations may lead to weak internal criteria and standards that fail to challenge. Even when external criteria and standards are held out as an example, they are likely to be countered by a host of arguments seeking to show why the criteria do not apply to the peculiarities of the local situation.

SUMMARY

The variety of techniques for quality review and assurance used in the United States is enormous and rich. Some activities are intended to prevent quality problems; some are designed to detect them; and still others are efforts to correct problems once they are identified.

This chapter has provided a highly selective sampler of methods for quality review and assurance. It illustrates the considerable range of efforts beyond those of the federal PRO program and shows there is much to learn from the professional and provider communities’ own efforts. All these methods have strengths and limitations, which we have cited here. We have, however, taken no position on the quality of those efforts. The techniques and approaches described are not necessarily the best, although some may well be state-of-the-art.

Our review here of quality assessment methods currently in use (and the descriptions of methods in various settings in Volume II) reveals inadequacies, in particular, the weak focus on the continuum of care across multiple providers for patients, especially those with chronic illness, who move from one setting of care to another. Review tends to focus on single events and single settings rather than episodes of care.

By and large, review techniques look at the technical quality of care and specifically at the physician component of decision making. For those receiving care, undertreatment, and to a lesser degree, overuse may be identified. The quality of interpersonal care and the use of patient outcomes in evaluating care are only now beginning to be incorporated into quality review efforts.

This chapter highlights the need for a much better understanding of how effectively to bring about change in provider performance and practice patterns. Available intervention methods include information feedback on performance, financial incentives and disincentives and penalties, and organizational development and change techniques. The emphasis is often on overcoming deficiencies in knowledge or skills, indifference, or impairment. The limited repertoire of practitioner-oriented interventions (e.g., education, exhortation, surveillance, and sanctions) is insufficient to address what may be far more complex reasons for inappropriate or poor technical care. These include the nature of medical work, its collaborative nature, the lack of physician control over many aspects of health services, and the organizational environment of practice.

Most quality assurance professionals have at one time or another become discouraged at the results of providing information on practice and hoping that a change in behavior would result. The likelihood of change may be linked to the seriousness of the deficiency, the relevance of the practitioner’s behavior to that deficiency, and the ease of changing behavior. The most promising strategies for changing individual behavior are likely to be those that act in concert with the training and practice characteristics of doctors in conforming to the medical culture, that help the already good practitioners as well as the outliers, and that recognize the limited ability of any given practitioner to change the delivery system.

Leadership and commitment in concert with other organizational goals may be the most important factors in organizational change. Organization and funding of quality assurance programs likely influence their effectiveness. The baseline level of performance and the external regulatory environment are also likely to influence an organization’s response to purported deficiencies.

: 1. Some of the discussion of licensure and board certification has been drawn from a paper, “Medicare Quality Assurance Mechanisms and the Law,” by A.H.Smith and M.J.Mehlman prepared for this study and referred to hereafter as Smith and Mehlman, 1989.
: 2. See Chapter 10 for a more extended discussion of appropriateness (practice) guidelines, patient management criteria sets, and algorithms.
: 3. Much of this discussion has been drawn from a paper by L.L.Roos, N.P.Roos, E.S.Fisher, and T.A.Bubolz prepared for this study and hereafter referred to as Roos et al., 1989. Some of the material will appear in Roos et al. (forthcoming) and Roos, 1989.
: 4. International Classification of Diseases, ninth revision, clinical modification. The Medicare hospital (Part A) files for instance, use ICD-9-CM codes, and diagnosis-related groups (DRGs) are based on them as well.
: 5. The development and testing of coding dictionaries that manage multiple medical terms such as those in use with the Computer Stored Ambulatory Record (COSTAR) have demonstrated promise in improving coding accuracy in ambulatory care.
: 6. Calculations done from data supplied by HCFA for the Second Scope of Work through February 1989 of review of more than 6.1 million records show the following: Of all records reviewed, nearly 24 percent failed at least one screen. Of those that failed at least one screen, physician advisors confirmed quality problems in 30 percent, or about 7 percent of all records reviewed (HCFA, 1989).
: 7. The bi-cycle model is one in which problems are identified, analyzed, attacked, and then reevaluated. In the case of problems caused by lack of practitioner knowledge or skills the methods of correction should involve highly targeted, pertinent continuing education.