Capturing patient experience: does quality-of-life appraisal entail a new class of measurement? – Journal of Patient-Reported Outcomes

We have found that the appraisal items for Standards of Comparison and Sampling of Experience are differentially associated across disease groups. The SOC items’ mean distances tended to be larger than those for the SOE, suggesting that the SOC items are better able than the SOE to systematically distinguish the patient-group pairings. When considered across demographic groups, the differences were less stark, but educational and racial differences remained notable. These findings support the idea that patient groups will differ in the patterns of relationships among the cognitive processes underlying QOL item response. The group differences underscore the importance of circumstances in appraisal response. These circumstantial facets may be useful for more simple applications such as understanding patient satisfaction, as well as for more complex purposes such as detecting response-shift effects [7] via a ‘contingent true score’ [2, 5].

In 2004, we first conceptualized that the measurement of appraisal requires an alternative approach to thinking about psychometrics [5]. At that writing, our focus was on the implications of appraisal for understanding psychometric properties of standard QOL measures. For example, test-retest reliability has to be understood in light of the stability of individual appraisal across measurement occasions. Since then, our experience studying appraisal has led to more clarity regarding the measurement properties of appraisal instruments themselves. We propose that appraisal tools represent a different kind of instrument than is commonly used in QOL research: that appraisal tools are idiometric QOL tools, in contrast to psychometric and clinimetric tools.

We choose this new term because we believe that appraisal tools are distinguished from both psychometric and clinimetric tools in three broad and important ways: theoretically; in their implications for statistical analysis; and in their applications in clinical practice and research. We will briefly discuss below each of these dimensions and their empirical support (summarized in Table 4; overlap shown in light grey shading).

Table 4 Theoretical, statistical and clinical distinctions among psychometric, clinimetric, and idiometric QOL tools

Full size table

Theoretical distinctions among tools

Psychometric tools aim to measure a construct comprised of one or more latent variables. Using items that are somewhat redundant within subscales that assess a latent variable, the intention is to achieve a level of internal consistency and unidimensionality that will provide robust results [41]. The items selected are effect indicators (i.e., reflective measurement model [42]), meaning they reflect the latent variable [43, 44] (Supplemental Fig. 1). Relationships among psychometric constructs such as fatigue and depression are essentially understood as being an intrinsic property of those constructs. Such relationships are considered in establishing psychometric construct validity of measures. Ideally, item response covers the full range of response options, and rarely endorsed items are generally dropped early in a tool’s development. The general understanding is that psychometric characteristics such as internal consistency, scale composition, and construct validity are properties of the measure itself and represent the quality of information provided by the measure.

In contrast, clinimetric tools (e.g., a measure of symptoms or social/physical environment) aim to identify a (clinical) phenomenon using items that span a broad range of symptoms, so internal consistency and unidimensionality are not priorities [45, 46]. The items selected are sometimes understood as causal indicators (i.e., formative measurement model [42]), meaning that they cause changes in the latent variable of QOL [43, 44] (Supplemental Fig. 1). Rarely-endorsed items are as valuable as commonly-endorsed items because they may help to differentiate clinical syndromes that have overlapping characteristics [45, 46]. Consistent inter-correlations (that is, a similar principal component structure) among items across samples on a clinimetric instrument is not a requirement and indeed, might not even be considered [44]. Rather, quality of clinimetric assessment is more strongly associated with face-validity and construct validity, such as differences among known groups. However, the meaning of clinimetric indicators is expected to be consistent across samples and measurement contexts.

Neither the psychometric nor the clinimetric model quite fits the requirements of appraisal measurement. Appraisal measures are not simply indicators of clinical events or disease-status changes. Appraisal measures are intended to assess the four sets of parameters in the QOL appraisal model. It is reasonable to expect that appraisal processes have both shared (universal) and unique (circumstantial) components that lead to different structures and behaviors across samples and contexts. In contrast to psychometric measures, associations among appraisal constructs may be highly contingent on circumstances. In our experience, appraisal measures do not behave like psychometric measures, but they correlate and explain variance in expected and meaningful ways. Similar to clinimetric tools, such idiometric tools would not emphasize internal consistency or unidimensionality. They would embrace both rarely endorsed and commonly endorsed items.

Statistical implications of tool differences

Strategies used to validate psychometric tools are not appropriate for use with clinimetric or idiometric measures [43, 44]. Psychometric tools should be able to demonstrate construct validity cross-sectionally in terms of both a factor structure that matches hypothesized constructs and correlations in anticipated directions with other measures of similar and disparate concepts. Such tools should also be able to document content, face, ecological and discriminant validity. The latter three would be the focus for both clinimetric and idiometric measures.

Factor structures of psychometric measures are expected to be consistent across populations, reflecting the generalizability of the constructs. We have not observed such consistency in appraisal parameter structure, but have found that the appraisal measures nevertheless consistently mediate the impact of health status changes on QOL ratings.

This pattern of findings led us to consider the need for an alternative approach to internal and external validity of appraisal measures. We focused on the distinction between construct representation and nomothetic span [41, 47]. Construct representation is demonstrated when item content and form represent the intended constructs (i.e., internal construct validity) [41]. In contrast, the nomothetic span refers to a pattern of stronger and weaker relations among measures of the same or different constructs, respectively (i.e., external construct validity) [41, 48].

Internal construct validity can be expressed in terms of the observed range of appraisal parameters elicited by a specific measure, relative to the theoretically-specified or expected range [5]. For example, we would expect that self-reported mood state would be more highly correlated with self-reported side effects among those individuals who place greater emphasis on “recent treatment events” in appraising their QOL.

Once we have established an individual’s criteria for appraising QOL, we would need to address external construct validity. (i.e., convergent and discriminant validity) [49]. For appraisal, nomothetic span would mean that the appraisal measure is associated with external constructs in theory-driven ways. Specifically, changes in appraisal might mediate the association between health-status changes (catalysts) and QOL ratings, which is consistent with response-shift theory [2]. A cross-sectional example might be that among individuals who consider “recent treatment events” in appraising their QOL, their ratings would be expected to correlate with a measure of the toxicity of their current treatment regimen [5].

Shared variance among idiometric items is understood as situational rather than intrinsic to the appraisal parameters, so working with item-level data rather than scale scores may be most enlightening. In idiometric analysis as in clinimetrics, variance unique to a single item in the set may be important for understanding patient experiences in different circumstances. Again, scale properties are not assumed to be inherent characteristics of items and measures as in psychometrics, but are instead substantially dependent on contextual influences that can drive inter-item correlations.

If one would prefer to work with sample-specific scale scores, we have found that principal components analysis (PCA) can be an effective data-reduction strategy with appraisal data (e.g., [6]). We have also seen that item correlation patterns can vary markedly across groups. PCA is selected because we do not expect to identify consistent latent factors underlying a set of items that pertain in every situation. We note that PCA may not be the method of choice if the sample is very heterogeneous. For example, one would not analyze in one PCA data from multiple countries with distinct cultures, languages, and healthcare environments. Item-level analyses would be more meaningful in such cases.

Of note, the only overlap in statistical implications for all three types of measures relates to longitudinal validity. Stability and responsiveness are important for all three types of measures. Stability is demonstrated by the lack of change in the absence of a catalyst (e.g., clinically significant change in QOL), and responsiveness is evidenced by the tool’s scores changing when there is a catalytic event.

Clinical applications of tools

All three types of tools can be used in clinical practice for screening QOL and providing feedback that can facilitate provider-patient communication [33, 34, 50,51,52,53], and all may be good predictors of outcomes. While psychometric tools may be used to document QOL, clinimetric and idiometric tools could identify people with a clinical or cognitive characteristic of interest, both of which provide meaningful background to QOL ratings. All tool types may also be used as the basis for an intervention, to identify an individual’s patterns and stimulate helpful discussions with providers.