Assessing Quality of Life: Measures and Utility

6

Assessing Quality of Life: Measures and Utility

J. Ivan Williams and Sharon Wood-Dauphinee

Quality-of-life research has included the study of levels of economic, political, social, and psychological well-being resulting from varying governmental and economic systems, as well as policies and public programs related to health. Schuessler and Fisher (1985) wrote that quality-of-life research began in the 1960s with the Report of the President’s Commission on National Goals in the United States. Most specialists agree that the term ”quality” has the same meaning as “grade” or “rank,” which can range from high to low or best to worst.

What elements of life are to be so graded? The units of analysis can be as large as a nation. Countries can be ranked on their economic systems and on the types and amounts spent by governments on social programs relative to expenditures on industry and the military. At the level of the individual, the elements can be objective (for example, job, income, shelter, and food) or subjective (happiness, sense of well-being, self-realization and the perceptions of the worth and value of life, and the like).

The best known studies of the quality of life of individuals are those of Andrews and Witney (1976) and Campbell and colleagues (1976, 1980) at the Institute for Social Research at the University of Michigan. Both teams of investigators asked questions about the domains of life satisfaction, including work, marriage, leisure activities, family, housing, and neighborhood. They developed a global measure of satisfaction by combining the scores in a general measure.

Quality of life studies in the health sector are more limited in scope. In the health sciences, the task at hand is to assess the impact of disease and its management, including interventions, on the well-being of the patient. The health states of the individuals may influence their quality of life without determining it. As Ware (1987) noted “jobs, housing, schools, and the neighborhood are not attributes of an individual’s health, and they are well outside the purview of the health care system.”

Health care researchers have developed numerous measures of quality of life over the past two decades, and several review articles have commented on those so far available. Their use in assessing the outcome of health care interventions has become popular. As we have seen in Chapter 2, recent studies have reported on the quality of life of men with mild to moderate hypertension undergoing antihypertensive therapy, of women with advanced breast cancer undergoing chemotherapy, and of cancer patients in hospice programs.

Although a variety of studies purport to assess quality of life, there is remarkably little agreement about the underlying concepts or theoretical framework that the measures represent. These measures may include clinical symptoms (for example, pain, nausea, vomiting), functional disability (Katz Activities of Daily Living), health status measures (RAND health status measures, Sickness Impact Profile), and measures of life satisfaction and psychological well-being.

The World Health Organization (WHO) has defined health as a “state of complete physical, mental, and social well-being and not merely the absence of disease or infirmity.” Ware (1987) argues that five health concepts are inherent in this definition: physical health, mental health, social functioning, role functioning, and general well-being. He takes a conservative approach to the study of quality of life in the health sciences. Because the goal of health care is to maximize the health component of the quality of life, he suggests that the measures be restricted to assessing health status.

Spitzer (1987) includes the burden of symptoms in his operational definition of health. He would restrict the assessment of the attributes of health to those who are definitely sick. He sees little point in extending the studies of quality of life in health care to the ostensibly healthy, but few writers in the field agree with this point of view.

Wenger et al. (1984), McDowell and Newell (1987), and Kane and Kane (1981) offer systematic reviews of a number of measures used in quality-of-life studies, including functional disability indices, health status scales, and measures of life satisfaction. In their reviews, these authors discuss the reliability and validity of a number of the measures and their uses in health care studies. We list the instruments they treat in the section entitled “Three Sources of Descriptive Information for Quality-of-Life Measures.” This chapter focuses on measures developed specifically to assess quality of life.

Issues in Selecting Quality-of-Life Measures

To choose measures for assessing quality of life, researchers need to address seven issues, briefly reviewed below.

Disease-Specific Versus Global Assessments

Measures may focus on the symptoms, complaints, disabilities, and disruptions in life that are specific to the clinical condition under study. Indeed, the disease-specific approach has been advocated in the study of arthritis, heart disease, and the evaluation of chemotherapy.

Alternatively, one can assess the quality of life resulting from the overall consequences of disease and management on the functional capacities and patients’ perception of well-being. The more global measures cover a number of dimensions within a summary score. For example, the Quality of Life Index developed by Spitzer et al. (1981) includes one item for each of the following dimensions: activities of daily living, principal activities, health, outlook, and support. Similarly, measures of life satisfaction and general well-being are global in perspective.

Other measures, such as the linear analogue self-assessment scales developed by Priestman and Baum (1976) or the Breast Cancer Questionnaire (Levine et al. 1988), are designed so that patients may repeatedly assess their symptoms and report their physical and emotional responses to adjuvant chemotherapy. The resulting scores show the patients’ immediate and specific responses to disease and treatment.

Clinical Endpoints Versus Long-Term Outcomes

Fletcher et al. (1988) state that the clinical endpoints commonly used for assessing prognoses include evidence of improvement following intervention, remission of disease, and recurrence. Clinical endpoints traditionally focus on sets of outcomes that are assessed near the time of diagnosis and treatment. Long-range outcomes can be viewed as those that are important to patients as they live with their resulting states of health.

Patient Ratings Versus Proxy Assessments

Investigators generally prefer that patients rate their own quality of life. Proxy assessments are important when patients are unable to respond. In these circumstances, researchers may use quality-of-life measures completed by other persons such as a responsible clinician, spouse, close friend, or relative of the patient.

Objective Versus Subjective Measures

Objective measures are based on variables that can be observed and recorded by various testing procedures and assessors. Measures of disease activity, remission of symptoms, presence of side effects, changes in functional capacity, ability to carry out usual activities, and family and social activities are phenomena that can be observed and recorded. These variables are important determinants of quality of life, and agreement can be reached about changes in status that have occurred.

Subjective measures provide opportunities for individuals to express their thoughts, knowledge, attitudes, moods, and feelings. Subjective phenomena may be related to particular diseases or types of therapy, or they may be more global.

Although researchers and policymakers tend to make much of the distinction between objective and subjective measures, both are probably necessary when assessing quality of life, and both require investigations into their reliability and validity. It is perhaps surprising that the objective measures often are not as well standardized as the subjective measures; objectivity does not automatically mean that measures are reliable and valid.

Cognitive Functioning

Researchers commonly exclude cognitive functioning from consideration in studies of quality of life. Except for diseases and therapies that obviously diminish mental capacity, investigators usually assume that the cognitive abilities of individuals are unaffected by episodes of illness and care. One may test this assumption by including tests of cognitive functioning, as did Croog et al. (1986) in their study of antihypertensive medications.

Ratings and Utilities

As Schuessler and Fisher (1985) indicate, quality-of-life measures provide ratings or rankings of health and life. Some assessments attempt to move from states of health to judgments of the worth or value of life with a given state of health. Investigators, working with concepts and methods developed in economics, are designing measures of the utilities of health states, with the typical scores ranging from 0 for “Death” to 1 for “Normal Health.” By multiplying the utility values by the number of years individuals live with a given health state, survival time can be expressed in Quality Adjusted Life Years (QALY). Health economists have used this approach to compare technologies in terms of costs per QALY gained. Not everyone agrees with such an approach, because it tends to diminish the value of a good, but troubled, life.

Utility measures move the measurement of quality of life from rankings to judgments of worth and value. This extension of the field of study is controversial; most particularly, the role of utility analysis in quality-of-life research is hotly contested.

Timing of the Assessments

Measures such as the linear analogue self-assessment scales, the Functional Living Index — Cancer, and the Breast Cancer Questionnaire are designed for repeated use before, during, and immediately after treatment. The purpose of the repeated measures is to assess patients’ short-term responses during the course of therapy.

Global assessment measures, such as the Spitzer Quality of Life Index, are designed to reflect the quality of life following the impact of disease and management or to reflect global changes in assessments over a long period of time. Investigators have used the Spitzer Quality of Life Index for repeated assessments during the course of therapy (Coates et al. 1987, Levine et al. 1988), but the scores tend to be less responsive to short-term clinical changes than the disease-specific measures.

The basic issue is the use of quality-of-life measures to assess short-term against long-term responses to therapy. For example, Levine et al. (1988) stopped taking assessments when patients withdrew from treatment or relapsed. Conversely, Chubon (1987) used the Life Situation Survey to compare the quality of life of patients in chronic care and rehabilitation programs with those of healthy subjects.

There is a problem with repeated self-assessment during the course of therapy. Investigators have found it difficult to maintain high self-assessment completion rates over several weeks (Finkelstein et al. 1988, Raghavan et al. 1988) and were not able to use the assessments because of missing values. Levine et al. (1988) minimized the problem by having nurses interview the patients during clinic visits; this procedure, however, added considerably to the time and costs of the study. If these measures are to be used repeatedly, the time and costs of maintaining high response rates over multiple assessments must be considered.

Summary

Some quality-of-life studies maintain one perspective or point of view. Yet it is becoming increasingly common for researchers to employ a mix of perspectives and methods in assessing quality of life. We have reviewed what is known about the conceptual framework, reliability, validity, and uses of specific measures. In any study, several tools may be combined to provide information on various perspectives: subjective and objective, disease-specific and global, clinical endpoints and long-term outcomes, and so on. No attempt will be made to sort out the combinations of approaches researchers have employed. Examples of multiple approaches to assessing quality of life are given in Chapter 2.

Strategies Used to Assess Instruments

A bewildering array of terms labels the properties of measures, and researchers in the health sciences frequently employ strategies for developing and testing measures that differ from those used in the social sciences. To standardize our work, we developed the Review Form for Quality-of-Life Measures. We used the Review Form to gather bibliographic information, the stated purpose of the measure, its underlying conceptual framework, and a description of its content and format. As part of this review, we have tried to use terms that are consistent with those compiled in the Dictionary of Epidemiology (Last 1988) by the International Epidemiology Association and that are used by writers in epidemiology (Feinstein 1987, McDowell and Newell 1987) and the social sciences (Bohrnstedt 1981, Kerlinger 1986, Nunnally 1978). This section briefly reviews some statistical and other expressions.

Reliability

Two basic strategies can be used to establish the reliability of a measure. For those based on subjective ratings of attitudes, perceptions, and sense of well-being, investigators may assess the reliability by examining the consistency of patterns of response across the items. The coefficient alpha (Cronbach 1951) measures the internal consistency of the response, based on the average correlation among the items and the number of items in the instrument. The coefficient assumes that the correlations in the matrix are all positive, because they represent the same dimension. Values of Cronbach’s alpha range from 0 to 1.

If Cronbach’s alpha is high (for example, 0.80 or higher), the responses are consistent, and the sum of the item responses yields a score for the underlying dimensions that the item represents. Stated another way, if the items are adequately sampled from the domain of quality of life, the sum of the responses should give a better indication of the quality of life of the individual than the response to any one item. A low coefficient alpha would indicate that the items did not come from the same conceptual domain or that the noise in the items was substantial.

The items can be divided and placed on alternate forms of the measure; the equivalence of the alternate forms can be tested by comparing the alphas. Alternatively, the items on one form can be split into two groups, and coefficients can be computed for each half and compared. Comparable coefficients confirm the consistency of the responses.

The scores for the split forms can also be correlated to see how they correspond. The Spearman-Brown formula uses this correlation to estimate the reliability of a scale containing all items after adjusting for the presence of twice as many items on the composite scale as in each of the two groups (Zeller and Carmines 1980).

Researchers may decide to create a multidimensional measure of quality of life and then select items that represent the dimensions of interest. For example, quality-of-life measures may have items related to conditions specific to disease and management (for example, nausea and vomiting in response to chemotherapy for cancer), and there may be additional items related to physical functioning, and social and psychological well-being.

Factor analysis statistically defines a small number of factors or underlying dimensions that account for a high proportion of the common variance of the items. Exploratory factor analysis is used to identify and discard items that are not correlated with the factors of interest. Alternatively, an investigator may use factor analysis to confirm that items selected to represent a single dimension of quality of life (for example, physical functioning) principally load onto that factor and correlate weakly with other factors. The factor represents a single dominant dimension or variable when the factor loadings for the items are relatively high — 0.60 or higher—and the common variance and the factor loadings cannot be increased by subdividing the items onto additional factors. Factors are not considered stable unless the results can be replicated in a number of samples and study settings. Once a factor is defined as representing a single variable or dimension, the responses for the items on each factor are summed to create the factor score.

For a measure with a fairly large number of items and a high coefficient alpha, one can use factor analysis to define two or more factors underlying the responses. A measure that is internally consistent may still not represent a single dimension. Factor analysis is used to define the underlying dimensions, and the coefficient alpha may then be used to assess the strength of the consistency of the items on the separate factors.

The stability of a scale or factor score is assessed by correlating the scores of subjects with the scores obtained in testing at another time. As Bohrnstedt (1981) has noted, the test-retest coefficient can be influenced by true changes in scores. The interpretation of the coefficient of stability is not always straightforward.

If the variables being considered are sufficiently objective to be evaluated by persons other than the patients, it is possible to compare raters’ scores. For example, the Quality of Life Index is designed to be completed by the health professional responsible for the care of the patient and significant others as well as by patients themselves. Interrater agreement indicates the reliability of the scores by different raters on a single occasion, and intrarater agreement is the reliability of the scores by the same rater over repeated testings.

If the measure is categorical, Cohen’s kappa (Fleiss 1981) is most frequently used to assess the level of agreement beyond that expected by chance. For rankings of ordinal measures, Spearman’s rho and Kendall’s tau may be used as measures of agreement in addition to kappa. Pearson’s product moment correlation is commonly used for comparing quantitative scores of raters.

The preferred measure of agreement is the intraclass correlation coefficient. It is particularly useful when there are three or more ratings. It compares the variance between subjects, the variance between raters, and the variance between times with the error variance. The intraclass correlation is reliable if most of the variance in the model is accounted for by the variance between subjects and if the variances by raters and by time are minimal (Fleiss 1986). The measure rests on the analysis of variance and can be used with ordinal as well as interval data. An intraclass correlation coefficient of, for example, 0.80 or higher indicates that the measure is highly reliable.

Scaling refers to the rules for assigning numbers to responses. The scaling determines whether the measure is a nominal, ordinal, interval, or ratio variable.

Validity

A first step in assessing the validity of a measure is to determine if the content of the items represents the domain or dimension of interest. Face validity is sometimes used to refer to the intuitive appeal of the items; content validity is reserved for the judgments of experts or specialists.

When there exists a variable external to the measure against which the scores can be checked, that variable can be used as a criterion to judge the measures. For example, the quality-of-life scores should differentiate patients dying of cancer, patients in intensive care, outpatients with chronic diseases, and healthy individuals, even though there may be substantial overlaps in the distributions of scores.

Concurrent criterion validity refers to the ability of a measure to differentiate between groups at the time the measure is applied. Predictive criterion validity refers to the ability to use these scores to predict future health-related events and states.

Quality-of-life measures can be compared with other measures as well. Concepts derived from theory and operationalized into reliable and valid measures are referred to as constructs. The measures under study can be tested against the constructs to determine if the observed relationships are as hypothesized. For example, quality of life should be negatively related to measures of pain, anxiety, and depression. Similarly, a measure of quality of life should be positively related to life satisfaction and general well-being.

To judge the sensitivity or responsiveness of a measure, the investigator should have a sense of how much change in a patient’s clinical or functional status would produce a change in their quality-of-life score. Significant clinical changes in the individual may not parallel changes in quality-of-life scores. Alternatively, a relatively small change in clinical levels may result in marked changes in a patient’s sense of psychological well-being.

Finally, the practicality of a measure refers to the ease and convenience of administration and interpretation. Practicality is particularly important if a measure is to be used repeatedly.

A Review of Selected Measures for Assessing Quality of Life

We reviewed 10 measures for rating quality of life using the Review Form for Quality-of-Life Measures. The section entitled “Ten Review Forms for Quality-of-Life Measures” presents the completed forms, and (see page 76, this volume) provides a summary.

TABLE 6-1

A Summary of Health-Related Quality-of-Life Measures.

The Quality of Life Index (QLI), developed by Spitzer et al. (1981), has been tested in a variety of settings. It is used to assess the physical, psychological, and social functioning of patients. The QLI yields a score that ranges from a high of 10 to a low of 0. Alternative forms for completion by the patient, the physician or other health professional, relative, or significant other were developed to determine whether comparable ratings could be obtained from several sources. The reliability and validity of the QLI have been demonstrated in a series of studies in Australia, Canada, and the United States with a variety of patients.

Chubon (1987), Padilla et al. (1983), and Ferrans and Powers (1985) developed global measures of quality of life to be completed by patients. Chubon’s Life Situation Survey assesses quality of life beyond disease-specific conditions and functional limitations, comparing the responses of patients in chronic care and rehabilitation programs with those of healthy subjects. Chubon tested his instrument with prison inmates, hospital patients, mentally retarded adults, spinal injury patients, and university students. Although the samples have been relatively small, the instrument appeared to work well with all groups, and the differences in mean scores were as predicted. Chubon also found positive changes in the mean scores of patients who completed a program for chronic back pain.

Padilla’s Quality of Life Index focused on physical conditions, activities, and attitudes of the patients. We found no reports of the measures other than the articles published by the developers of the instruments. Padilla originally developed her measure while working with cancer patients. She adapted the measure for use with colostomy patients, adding a number of disease-specific items. Although the measure was designed to be global, we found no use of the adapted measure across conditions.

Ferrans’ Quality of Life Index focused on the satisfaction of needs; this measure is broader in scope. It taps life satisfaction in areas outside the immediate reach of health care (for example, marriage, education, occupation, future retirement), in addition to items related directly to health. By 1988, results had been reported for healthy graduate students and dialysis patients.

Karnofsky and Burchemal (1949) were among the first to develop a measure to assess the ability of cancer patients to perform daily activities. Their measure has been studied extensively and is widely used, although it has been criticized both conceptually and for its measurement properties. The consensus seems to be that it continues to be a useful tool for physicians to use in rating the impact of cancer and cancer treatment on patients’ ability to lead normal lives.

The Functional Living Index — Cancer (FLIC) is one of the newer instruments. The FLIC contains 22 items pertaining to symptoms and complaints related to cancer treatment, as well as the impact of disease and management on physical, psychological, and social functioning. The items were tested on 837 patients in Winnipeg and Edmonton, Canada. When the data were factor analyzed, Schipper et al. (1984) found that the mean factor scores for four patient groups decreased with the extent of disease. The investigators have completed some construct validation exercises. The FLIC is designed to be completed daily by patients. The responsiveness of the scores to changes over time has yet to be established.

Selby et al. (1984) have taken another approach to the development of an instrument for cancer patients. They took 18 items from the Sickness Impact Profile and added 12 items based on clinical experience, along with 2 statements for a global rating of quality of life and life satisfaction. The resulting questionnaire is designed to be completed by either physicians or patients. Factor analysis has been used to define the dimensions the items represent. The changes in scores reflect response to chemotherapy. We found no reports of uses of the instrument by investigators other than Selby and his colleagues.

We found considerable discussion of linear analogue self-assessment (LASA) or visual analogue scales (VAS) for rating quality of life. These scales are typically 10 centimeters long with the low or poor end of the scale anchored at 0 and the upper end anchored at 100. In response to a cue word or phrase, patients mark their self-assessments on the line. The point marked to the nearest millimeter produces the score. Priestman and Baum (1976) were among the first to use this technique for quality-of-life assessments of cancer patients. In a number of studies these and other investigators have used items related to symptoms and side effects, anxiety and depression, personal relations, and functioning, but the actual cues have varied from study to study.

The scores from repeated testing over the course of treatment for advanced cancer have been reported for individual items, but we found no reports of the formal psychometric properties in the measure. A minority of eligible subjects participated in the repeated use of the form, but the loss to follow-up is not explained. The use of the LASA needs to be standardized so that measurement properties of the resulting scales can be formally tested.

Three trends can be observed in the development of quality-of-life measures. First, although investigators have focused on the clinical relevance of the measures, minimal attention has been paid to the conceptual underpinnings of quality of life or the theoretical bases for the particular measures. Second, most researchers develop and modify the measures without formally testing the reliability, validity, and responsiveness of the resulting scores. Third, the various measures have been developed in isolation from each other, and attempts to compare and contrast the various measures of quality of life are rare.

A Review of Utility Assessments in Quality of Life

The utility assessment of health states and quality of life has arisen from a theoretical perspective and methodology that are distinct from those employed by behavioral and clinical scientists. Utility assessment has two components, the judgment of the value or worth of life at a given point in time and the quantity or years of life spent in various health states.

The utility value assigned to a health state generally ranges from 0, the value ascribed to death, to 1, the value ascribed to the reference state of a healthy life. By multiplying a utility value for a health state by the number of years of duration of the expected health state, the resulting product is the Quality Adjusted Life Years (QALY). Health economists posit that health care programs should be evaluated by comparing the relative costs of the programs with the QALYs produced.

The general approach for assessing utility values is based on modern utility theory, advanced by von Neumann and Morgenstern (1953). The theory describes a method for decisionmaking under conditions of uncertainty based on a set of axioms of rational behavior. Holloway (1979) has summarized the wide uses of this model for decisionmaking. Drummond et al. (1986), Torrance (1986, 1987), and Weinstein (1983) have written reviews and summaries of the utility analysis of health care programs. Smith (1988) has presented a number of papers with applications of utility analysis. The reader may wish to refer to these sources for detailed discussions of the theory and methods of utility analysis.

The major groups of researchers responsible for applying utility theory to the health field include the late James Bush, Robert Kaplan, and their colleagues at the University of California at San Diego; Rachel Rosser and her colleagues at Charing Cross Hospital in London; George Torrance and his colleagues at McMaster University in Hamilton, Ontario; and Milton Weinstein and his colleagues at Harvard University. Torrance (1986, 1987) and Kaplan and his associates (Kaplan et al. 1984, Anderson et al. 1988) have published information on the reliability and validity of their methods, and we review their works briefly.

The description of the health state is the first step in deriving utility values. Torrance et al. (1982) have identified six attributes that should be included in a description of health state: physical function, emotional function, sensory function, cognitive function, self-care, and pain. The description would indicate the level of functioning on each of the attributes associated with a particular health state. The descriptions can be presented in narrative paragraphs, videotapes of patients, or in other forms.

The descriptions are presented to patients with the given health states, their close relatives or friends, or health care professionals for judgments of the utility values to be assigned to the states. The utility values may be rated on a visual analogue scale ranging from 0 to 100, with 0 indicating the worse possible health state (death) and 100 the best possible health state. This method is referred to as a rating scale.

The standard gamble technique was the original method for deriving utility values. It sits directly on the axioms of utility theory. The subject uses the standard gamble to choose between two alternatives to treatment. The outcome of an intervention (new procedure) may be a good outcome with a given probability (for example, 80 percent chance of restoration to normal health) or a worse outcome with a given probability (such as a 20 percent chance of permanent disability or death). The second intervention (for example, another treatment or no treatment at all) is presented with a certain (100 percent sure) outcome of intermediate desirability relative to the good and bad outcomes associated with the first intervention. The probabilities associated with the new intervention (p for a good outcome, 1 – p for a bad outcome) are varied until the subject perceives no real difference between the interventions, and the utility value is then calculated for the various health states of the second intervention. Torrance (1986) reported that the standard gamble method can be used to measure utilities for chronic health states preferred to death, chronic states considered worse than death, and temporary health states.

Torrance et al. (1972) developed the time trade-off method for use in health care evaluations, and they claim it is simpler to use than the standard gamble approach. The subject considers a health state associated with a problem that is to last for a fixed period of time as opposed to a shorter period of healthy life. The subject is asked to ”trade off” the time in a compromised health state with a lesser time in a healthier state. The time in the healthy state is varied until the point of indifference is found, and the utility value is calculated accordingly.

With six key attributes and multiple levels on each attribute, a large number of unique health states would have to be defined to describe all possible combinations of attributes. Torrance et al. (1982) have used multiple attribute theory to reduce the number of measurements required to obtain the utility values for all combinations of attribute levels.

Torrance (1987) presented a summary of the reliability ratings and tests of validity of the utility values from the rating, standard gamble, and time trade-off methods. The interrater and test-retest reliabilities range from 0.63 to 0.88. The results of the rating scales and time trade-off methods have been validated through comparisons with the standard gamble approach. (Torrance refers to this as criterion validity for the standard gamble method because it is derived directly from the axioms of utility theory. We refer to this as construct validity because the standard gamble method is a scientific construct for inferring preferences in decisionmaking.) Churchill and his associates (1987) compared time trade-off utilities of end-stage renal disease patients with the ratings of physicians on the Quality of Life Index and found them to be congruent. That is, they demonstrated construct validity.

The methods are time consuming, demanding of the subjects, and costly to apply. The McMaster group has refined the methods and simplified the tasks. They have achieved participation and task completion rates of at least 85 percent.

The San Diego group has taken a different approach to assessing utility values (Kaplan et al. 1984, Kaplan and Bush 1982). Their first step was to categorize individuals in given health states with respect to levels of mobility, physical activity, and social activity. The second step was to classify the same individuals by the symptoms and health problems that they have on a given day. Four hundred case descriptions were written to encompass the combinations of functional levels and symptoms or problems.

Random samples of individuals in a community gave preference ratings to the descriptions on a continuum ranging from 0 for death to 1 for completely well. A model for preference structure assigned weights to each level of functioning and symptoms/problem complex. Quality of Well-Being scores are derived by applying the weights for functional levels and symptoms/problems to health states of interest, and the Quality of Well-Being (QWB) scores are the utility values for those states.

Anderson et al. (1988) compared the reliability of the QWB scores in general household samples and a clinical outcome study of burn patients. In initial interviews, the subjects completed self-administered forms and personal interviews. In a follow-up survey they repeated the process. They used internal consistency analysis to detect discrepancies in responses and reported that 50 percent of the discrepancies were the result of correctable errors. They concluded that personal interviews are required for the reliable use of the QWB.

We found no published reports that compare the utility values derived by the standard gamble, time trade-off, and rating scale methods outlined by Torrance with the QWB utility values developed by the San Diego group.

Several questions and criticisms have been directed toward the use of utility values and QALYs in quality-of-life assessments. Some experts debate whether the utility values should be obtained from the public at large, the providers, or the patients themselves. Others argue that the utility assessments are incomplete unless they include the perspectives of the family members whose lives are directly affected by the health status and quality of life of the patients. If the patient is unable to form a judgment, should the next of kin or some close friend be asked to make a decision about the perceived utility of the patient’s health status and prognosis?

Patients’ assessments of the utility of health states change as their health does. For this reason, utility values may not be stable over long periods of time. Furthermore, projections about morbidity, disability, and mortality frequently depend on expert opinion in the absence of sound epidemiological data on the natural history of disease and the impact of interventions. Consequently, assumptions about life expectancy may be only crude estimates of actual experience.

Experts do not agree on the key attributes to be included. Torrance advocates the inclusion of physical, emotional, sensory, cognitive, and self-care functioning, in addition to pain, but he excludes social functioning. In actual use the descriptions used in the standard gamble and time trade-off methods vary according to the disease or technology being evaluated. The QWB is narrow in focus because it encompasses only mobility, physical activity, social activity, and symptoms.

Lastly, although individuals may understand and agree with the ratings for the levels of functioning for a set of attributes, they agree less when the issue is whether a derived utility value accurately reflects the worth of human life. The public has even more skepticism about multiplying the life expectancy times the utility values to obtain a “quality-adjusted life year.” In summary, utility assessments of quality of life can at best be described as technology with promise and potential, but not as one accepted by the public.

Three Sources of Descriptive Information for Quality-of-Life Measures

The editors and the authors of this chapter refer readers to three books for reviews of more extensively studied and firmly established quality-of-life measures. The first, Assessment of Quality of Life in Clinical Trials of Cardiovascular Therapies, reviews six quality-of-life instruments and provides information on their content, administration, development, validity, reliability, generalizability, applications, and major strengths and limitations. The book lists references for these instruments, contains reproductions of many of them, and compares and contrasts them. The citation for the book and the names of instruments included are:

Wenger, N.K., Mattson, M.E., Furberg, C.D., and Elinson, J., eds. Assessment of Quality of Life in Clinical Trials of Cardiovascular Therapies. New York, Le Jacq Publishing, Inc., 1984

  • Sickness Impact Profile (SIP)

  • Quality of Well-Being (QWB) Scale

  • Psychological General Well-Being (PGWB) Index

  • McMaster Health Index Questionnaire (MHIQ)

  • Nottingham Health Profile (NHP)

  • General Health Rating Index (GHRI)

The second book is entitled Measuring Health: A Guide to Rating Scales and Questionnaires. It reviews measures by name, developers, purpose, conceptual basis, and description. It offers information on reliability and validity, alternative forms of each instrument (if any), references, commentaries on strengths and limitations, the addresses of the original test developers, and complete or partial reproductions of the instruments. Each review has been checked for accuracy and completeness by the instrument developers.

This book features a “consumer’s guide” to the various instruments, which provides information on numerical characteristics of the scale, length, applications, method of administration, a rating of how widely each instrument is used, and a rating of reliability and validity. The citation for the book and the names of instruments listed are:

McDowell, I., and Newell, C. Measuring Health: A Guide to Rating Scales and Questionnaires. New York, Oxford University Press, Inc., 1987.

Activities of Daily Living (ADL) Scales

  • The PULSES Profile (Physical condition, Upper limb functions, Lower limb functions, Sensory components, Excretory functions, mental and emotional Status)

  • The Barthel Index

  • The Index of Independence in Activities of Daily Living, or Index of ADL

  • The Kenney Self-Care Evaluation

  • The Physical Self-Maintenance Scale

  • The Functional Status Rating System

Instrumental Activities of Daily Living (IADL) Scales

  • A Rapid Disability Rating Scale

  • The Functional Status Index

  • The Patient Evaluation Conference System

  • The Functional Activities Questionnaire

  • The Lambeth Disability Screening Questionnaire

  • The Disability and Impairment Interview Schedule

Psychological Indices

  • The Health Opinion Survey

  • The 22 Item Screening Score of Psychiatric Symptoms

  • The Affect Balance Scale

  • The General Well-Being Schedule

  • The Mental Health Inventory

  • The General Health Questionnaire

Social Health Indices

  • The Social Relationship Scale

  • The Social Support Questionnaire

  • The Social Maladjustment Schedule

  • The Katz Adjustment Scales

  • The Social Health Battery

  • The Social Dysfunction Rating Scale

  • The Social Functioning Schedule

  • The Interview Schedule for Social Interaction

  • The Structured and Scaled Interview to Assess Maladjustment

  • The Social Adjustment Scale

Quality-of-Life and Life Satisfaction Indices

  • The Quality of Life Index

  • Four Single-Item Indicators of Well-Being

  • The Life Satisfaction Index

  • The Philadelphia Geriatric Center Morale Scale

Pain Measurements

  • Visual Analogue Pain Rating Scales

  • The Oswestry Low Back Pain Disability Questionnaire

  • The McGill Pain Questionnaire

  • The Self-Rating Pain and Distress Scale

  • The Illness Behavior Questionnaire

  • The Pain Perception Profile

General Health Measurements

  • The Arthritis Impact Measurement Scale

  • The Physical and Mental Impairment-of-Function Evaluation

  • The Functional Assessment Inventory

  • The Nottingham Health Profile

  • The Sickness Impact Profile

  • The Multilevel Assessment Instrument

  • The Older Americans Resources and Services (OARS) Multidimensional Functional Assessment Questionnaire

  • The Comprehensive Assessment and Referral Evaluation

  • The Quality of Well-Being Scale

The third book, Assessing the Elderly: A Practical Guide to Measurement , contains reviews of instruments in four major areas of measurement important to long-term care (LTC) providers: physical functioning, mental functioning, social functioning, and multidimensional or composite measures. It outlines methods of administration, reliability and validity; types of scales used; the strengths and limitations of the measures; and their similarities and differences and lists their items and characteristics according to function and purpose. It also offers practical suggestions for their use. The authors also cite unpublished instruments — “perhaps circulated at professional meetings”—that may be of interest to researchers developing or modifying instruments. The book citation and a partial list of instruments are as follows:

Kane, R.A., and Kane, R.L. Assessing the Elderly: A Practical Guide to Measurement. Lexington, Massachusetts, D.C. Heath and Company, 1981.

Measures of Physical Functioning

Measures of Physical Health

  • Cornell Medical Index

  • Cumulative Illness Rating Scale

  • Health Index

  • Patient Appraisal and Care Evaluation (PACE) II: Medical Data

  • Patient Classification for Long-Term Care (LTC): Impairments and Medical Status

  • Older Americans Resources and Services (OARS): Physical Health

Measures of Ability to Perform Activities of Daily Living (ADL) or Physical Functioning

  • PULSES Profile

  • Index of ADL

  • Kenney Self-Care Evaluation

  • Barthel Index Rapid Disability Rating Scale (RDRS)

  • Barthel Self-Care Ratings

  • Granger Range of Motion Scale

  • Kenney Self-Care Evaluation

  • PACE II: Physical Function

  • OARS: Physical ADL

  • Functional Health Status of the Institutionalized Elderly ADL-A

Measures of Ability to Perform Instrumental Activities of Daily Living (IADL)

  • Functional Health Status

  • PGC Instrumental Activities of Daily Living

  • Instrumental Role Maintenance Scale

  • PACE II: IADLs

  • OARS: Instrumental ADL

  • Functioning for Independent Living

  • Performance Activities of Daily Living (PADL)

  • Pilot Geriatric Arthritis Project Functional Status Measure (PGAP)

Measures of Mental Functioning

Measures of Cognitive Functioning

  • Vigor, Intactness, Relationships, and Orientation (VIRO) Orientation Scale

  • Mental Status Questionnaire (MSQ)

  • Short Portable Mental Status Questionnaire (SPMSQ) from OARS

  • Philadelphia Geriatric Center (PGC) Mental Status Questionnaire

  • PGC Extended Mental Status Questionnaire

  • Memory and Information Test (MIT)

  • Dementia Rating Scale (DRS)

  • Extended Scale for Dementia

  • Face-Hands Test

  • Visual Counting Test

  • Set Test

  • Misplaced Objects Test

  • Wechsler Adult Intelligence Scale (WAIS) Short Form

  • Wechsler Memory Test

  • Quick Test (QT)

  • Mini-Mental State Examination

  • Geriatric Interpersonal Evaluation Scale (GIES)

Measures of Affective Functioning

  • Zung Self-Rating Depression Scale (SDS)

  • Beck Depression Index

  • Hopkins Symptom Checklist

  • Affect-Balance Scale

Measures of General Mental Health

  • OARS Mental Health Screening

  • Screening Score

  • Emotional Problems Questionnaire

  • Savage-Britten Index

  • Sandoz Clinical Assessment — Geriatrics

  • London (Ontario) Psychogeriatric Rating Scale (LPRS)

  • Gerontological Apperception Test (GAT)

  • Senior Apperception Test (SAT)

  • Geriatric Mental State Examination

  • Psychological Well-Being Interview

  • Nurses Observation Scale for Impatient Evaluation (NOSIE)

Measures of Social Functioning

Measures of Social Interactions and Resources

  • Network Analysis Profile

  • Social Networks Assessment Questionnaire

  • Role Activity Scales

  • Mutual Support Index

  • Family Structure and Contact Battery (1968)

  • Exchanges Between the Generations Index

  • Family Structure and Contact Battery (1972)

  • Exchanges of Support and Assistance Index

  • Hebrew Rehabilitation Center for the Aged (HRCA) Social Interaction Inventory

  • Bennett Social Isolation Scales

  • Family Adaptation, Partnership, Growth, Affection, Resolve (APGAR)

  • OARS Social Resources Scale

  • Social Dysfunction Rating Scale

  • Social Behavior Assessment

  • HRCA Reduced Activities Inventory

  • Activity Scale

  • Unusual Day

  • Future Activity Scores

Measures of Subjective Well-Being and Coping

  • Cavan Attitude Inventory

  • Kutner Morale Scale

  • Life Satisfaction Index

  • Oberleder Attitude Scale

  • Contentment Scale

  • Tri-Scales

  • PGC Morale Scale

  • Geriatric Coping Schedule

  • Mode of Adaptations Patterns Scale

  • Geriatric Scale of Recent Life Events

Measures of Person-Environment Fit

  • Importance, Locus, and Range of Activities Check-list

  • Locus of Desired Control

  • Perceived Environmental Constraint Index

  • Satisfaction with Nursing Home Scale

  • Home for the Aged Description Questionnaire

  • Ward Atmosphere Scale

  • Community-Oriented Programs Environment Scale (COPES)

  • Sheltered Care Environment

  • Person-Environment Fit

  • Person-Environment Fit Scale

Multidimensional Measures

  • Sickness Impact Profile (SIP)

  • Older Americans Research and Service (OARS) Center Instrument

  • Comprehensive Assessment and Referral Evaluation (CARE)

  • Patient Appraisal and Care Evaluation (PACE)

  • Stockton Geriatric Rating Scale

  • Plutchik Geriatric Rating Scale

  • Parachek Geriatric Rating Scale

  • Physical and Mental Impairment-of-Function Evaluation Scale (PAMIE)

We also refer readers to the Clearinghouse on Health Indexes of the National Center for Health Statistics of the U.S. Department of Health and Human Services. The Clearinghouse publishes a quarterly Bibliography on Health Indexes (editor, P. Erickson) that provides information on the reliability, validity, and sensitivity of various measures of health status.

Editors’ Note: The authors have supplied information about sources of descriptions of measures and their validity and reliability. Those especially concerned about such matters may wish to go directly to the section entitled “Strategies Used to Assess Instruments,” and then to the section entitled “Three Sources of Descriptive Information for Quality-of-Life Measures,” which lists three key reference works that provide names, descriptions, and properties of a number of standard instruments. Readers may then skip to “Ten Review Forms for Quality-of-Life Measures,” where sources are listed and review forms supplied for some instruments not described in standard works. This chapter contains a special segment that describes utility analysis, a special econometric approach to measures of quality of life.