Measuring Health-Related Quality of Life by Experiences: The Experience Sampling Method

We can conclude that the use of the ESM to measure accounts of the momentary experience of health in different populations is feasible. Retrospective measures may provide a biased account of the impact of health problems in the daily lives of people who are affected. Moreover, the bias may be different in different conditions.

The overall participation rate was low, but there were no dropouts and the number of completed beeps was comparable to that in other studies. Multilevel analysis showed that feelings and symptoms were significant predictors of momentary HRQOL. The strength of these relations differed among three patient groups and a population-based sample. The EuroQol visual analogue scale was not predicted by momentary feelings and symptoms.

Momentary HRQOL was examined with the experience sampling method (ESM) in 139 participants from four different samples. The ESM consists of a so-called beep questionnaire that was administered 10 times a day by an electronic device. Feasibility was determined by assessing willingness to participate in the study and by analyzing the percentage of dropouts and the number of completed beep questionnaires. Multilevel analysis was used to investigate the relation between momentary HRQOL and momentary feelings and symptoms. The relation between momentary outcomes and the EuroQol visual analogue scale was investigated with a multiple regression model.

In this study, first, we assessed the feasibility of using the ESM to obtain accounts of the momentary valuation of HRQOL in different patient populations. Next, it was expected that if the momentary valuation of HRQOL would vary over time, this would be an indication that the momentary valuation of a global concept such as HRQOL is influenced by the momentary experience of more specific feelings and symptoms. Therefore, we assessed whether the momentary valuation of HRQOL is variable from moment to moment within persons. Furthermore, we examined the relation between momentary accounts of specific feelings and symptoms and the momentary valuation of HRQOL. Finally, we examined the relation between the global retrospective valuation of HRQOL (as obtained by the EQ-VAS) and the momentary accounts of feelings and symptoms and valuation of HRQOL.

In the present study, we used the experience sampling method (ESM) [] to obtain momentary accounts of feelings, physical symptoms (PS), and HRQOL. The ESM is characterized by the collection of multiple self-reports of an individual’s (near) real-time feelings, thoughts, and activities in real-world environments. ESM studies are conducted using paper diaries or (increasingly) electronic devices []. These devices beep at random moments, when participants are asked to complete a questionnaire. A potential limitation of the ESM is that it can be time consuming and intrusive, and as a result burdensome to participants [].

Semantic memory is a more structured record of facts and knowledge about the external world and relies more on generalized beliefs than on experiences. In this regard, there is a distinction between retrospective self-reports of global concepts and retrospective self-reports of specific feelings and symptoms. Global reports of past health will rely more on semantic memory, whereas reports on specific feelings and symptoms may more easily be recovered by detailed episodic recall []. As a result, the retrospective global valuation of health may be more prone to bias than the retrospective description of detailed aspects of health such as specific feelings or symptoms, a problem that increases with temporal delay []. More fundamentally, there is an increasing awareness that experiences are dynamic, situated, and highly context driven (see the contributions in Mesquita et al. []), thereby providing a powerful rationale for investigating experiences in the context in which they occur []. Moreover, bias in retrospective self-report might be different in different patient populations. For instance, depression has been shown to have an effect on memory performance []. As a result, a higher discrepancy between retrospective self-report and actual experiences may occur in persons suffering from psychological complaints. Furthermore, people do not adapt well to noise []. As a result, patients with a complaint such as tinnitus, which is the experience of a sound without an acoustic source, might disproportionately focus on this aspect when evaluating their HRQOL retrospectively.

Robinson and Clore [] reviewed several studies describing discrepancies between momentary and retrospective self-reports. Retrospective self-reports are less than perfect reflections of experience because feelings are not always accurately represented in memory. If not measured directly, affective experience needs to be reconstructed on the basis of episodic or semantic memory. Episodic memory is the recollection of past personal experiences that occurred at a particular time and space. With regard to episodic memory, Kahneman [] and Kahneman et al. [] found that more memorable details of an emotional event disproportionately affect retrospective estimates of emotion. Also, there is a gradual decline in episodic memory over time [], which leads to a reliance on semantic memory to fill in the memory gap of hedonic experience.

The quantification of the subjective experience of health-related quality of life (HRQOL) is crucial to the evaluation of health care technologies. HRQOL has been defined as an individual’s perception of his or her physical health, psychological state, level of independence, social relationships, and relationship to the environment []. To assign meaningful numbers to HRQOL outcomes, the experience needs to be described in terms of severity and assigned a value. Instruments to obtain patient descriptions and valuations of their own health, such as the EuroQol 5D (EuroQol five-dimensional questionnaire) health description and the EuroQol visual analogue scale (EQ-VAS), rely on retrospective self-report. One problem with retrospective self-report is that it is likely to give a biased account of real-world experiences due to imperfect recollection of past experiences []. In other words, it only partially captures the impact of health problems in the daily lives of people who are affected. An alternative to retrospective self-report is to study outcomes from moment to moment in the context of daily life. The objective of the present study was to explore the potential value of obtaining momentary, instead of retrospective, accounts of the description and valuation of a person’s own HRQOL. In this study, we focus on the physical and psychological dimensions of HRQOL.

Aggregated means and SDs of momentary HRQOL, PA, NA, and PS were calculated and compared with the EQ-VAS at briefing and debriefing for the total group and all four subgroups. To examine how much of the variance in the EQ-VAS at debriefing was explained by momentary PA, NA, PS, and momentary HRQOL, a multiple regression model was fitted to the aggregated data using standardized variables. Age, sex, EQ-VAS at briefing, and sample (by including three dummy variables with the population-based sample as reference group) were included as covariates.

Bivariate correlations between momentary HRQOL and PA, NA, and PS for each participant were computed. Correlations were interpreted according to the following benchmarks: 0.1 to 0.3 was interpreted as small, 0.3 to 0.5 as medium, and more than 0.5 as large []. To examine whether momentary feelings and symptoms predict momentary HRQOL, a multilevel random regression model was estimated with momentary HRQOL as the dependent variable and momentary PA, NA, and PS as independent variables. These analyses were computed with the XTMIXED modules of STATA (version 11.0). Because different scales were used, all variables were standardized. The analyses were corrected for age, sex, and group. Group was entered in the mixed regression as a categorical variable using dummy coding, with the population-based sample as a reference category (see Table 3 for details). To determine the explained variance of PA, NA, and PS separately, these variables were first added separately to the basic model (which included momentary HRQOL and the covariates). A final model was fitted with momentary HRQOL as dependent variable and momentary PA, NA, and PS and their interaction with the dummy variables as independent variables. We expected a positive relation between PA and momentary HRQOL and a negative relation between NA and PS and momentary HRQOL.

To determine whether there was variability in momentary HRQOL within persons, for each respondent an SD was determined for the responses to all beep questionnaires. In addition to a descriptive analysis, a repeated-measures analysis of variance with a Greenhouse-Geisser correction was used to explore whether the variability in valuations during the ESM week differed over the days. A linear regression was used to examine the relation between the mean HRQOL and the mean SD of HRQOL.

A principal-components exploratory factor analysis on PA and NA items and PS was used to examine the underlying factor structure []. Results confirmed a three-factor solution. We, therefore, created a PA scale, an NA scale, and a PS scale by calculating the means of the respective items. Details can be found in Appendix A

To determine the willingness to participate in this study, the number of participants who were approached for participation was compared with the number of participants who actually participated in the study. The percentage of dropouts was recorded and analyzed. Feasibility was further assessed by analyzing the number of completed beep questionnaires.

On the eighth day, the participants returned for a debriefing session. The ESM period was reviewed by means of a questionnaire. Participants had to answer whether the PsyMate had influenced their mood, activities, thoughts, or contacts with other people and whether they had been annoyed by the beeps. Furthermore, participants were asked whether the ESM week had been a typical week, whether any unusual incidents had occurred, whether items were unclear, and whether the questions allowed them to give a good representation of their experiences during the day. The EQ-VAS and the HADS were readministered.

The ESM period comprised 6 days, starting the day after the briefing. During this week, the participants were asked to continue their normal life while carrying the PsyMate with them.

During the briefing (approximately 3 hours) on the first day, the rationale of the study was explained and an instruction on the use of the PsyMate was given. A try-out sampling moment was simulated in which the participants were coached in answering the questions on the PsyMate. After the try-out baseline, global data were collected (the EQ-VAS, the HADS, and personal characteristics).

Anxiety and depression was measured with the Hospital Anxiety and Depression Scale (HADS), which contains 14 items and has good reliability and validity []. Each item on the questionnaire is scored on a scale of 0 to 3, with 3 indicating higher symptom frequencies. In addition, data on personal characteristics were collected.

A global retrospective valuation of health, or HRQOL, was obtained using the EQ-VAS. The EQ-VAS is part of the EuroQol instrument, and it ranges from 0 (worst imaginable health state) to 100 (best imaginable health state). The EQ-VAS has good reliability [].

The ESM consists of a beep questionnaire that participants are required to fill out at several unpredictable moments during the day, in addition to questions in the morning, on waking and in the evening when going to sleep. The validity and reliability of the Maastricht routine has been documented elsewhere []. In this study, we used the PsyMate, a small user-friendly device programmed to generate beeps (and vibrations) 10 times a day between 07.30 h and 22.30 h randomly in 1½-hour intervals. At every beep, the PsyMate presents the questions and records the responses using a touchscreen keyboard. The beep questionnaire (see Appendix B in Supplemental Materials found at doi:10.1016/j.jval.2014.10.003 ) consists of items on feelings, physical symptoms, context (location, interaction, activities), and overall HRQOL. For the items on feelings—six for positive affect (PA) and five for negative affect (NA) and PS (four items)—a seven-point Likert scale was used. The contextual items had predetermined answering categories. To obtain a valuation of momentary HRQOL, a VAS anchored in the same way as the EQ-VAS (0 being the worst imaginable health state and 100 being the best imaginable health state) was included []. A detailed description can be found in Appendix A

The study population consisted of 139 participants. To ensure a variety of experienced health states in the study population, participants were recruited from three patient groups—experiencing somatic complaints with a known cause (atherosclerosis or venous insufficiency), somatic complaints without a known cause (tinnitus), and psychological complaints (anxious or depressed)—and a population- based sample. All participants were 18 years or older. Exclusion criteria were not being able to read and write in Dutch or not being able to handle the electronic ESM device because of impaired motor skills (for more details, see Appendix A in Supplemental Materials found at doi:10.1016/j.jval.2014.10.003 ).

There was no significant difference between mean momentary HRQOL (69.85) and mean EQ-VAS (72.85) at briefing (t = −3.111; P = 0.002) and at debriefing (74.37; t = −4.606; P = 0.000) ( Table 2 ). When EQ-VAS at debriefing was predicted by momentary experiences (and corrected for group differences, age, sex, and EQ-VAS at briefing) without taking into account the interaction effects between momentary experiences and sample, it was found that EQ-VAS at briefing (α < .05) and momentary HRQOL (α < .05) were significant predictors of EQ-VAS at debriefing ( Table 4 ). If the interaction terms were added to the model, the fit of the model did not improve (R= 0.82).

In Table 3 , the results of the multilevel analysis are presented. Model 6 (the final model) showed that all variable estimates were in the expected direction. Both PA and PS were highly significant predictors (P < .001) when controlling for age, sex, and sample (i.e., condition). These main effects, however, were moderated by condition. Specifically, significant interaction terms for PA and psychological complaints and tinnitus samples suggest that PA is a stronger (positive) predictor of momentary HRQOL in these two conditions than in the population-based sample, an effect not found for the somatic complaints sample. With respect to PS, the interactions suggest a stronger (negative) predictor of momentary HRQOL in these two conditions than in the population-based sample. Moreover, although there was only a marginally significant main effect of NA (P = 0.07), the significant interaction between NA and psychological complaints suggested that this was the only condition in which NA was more negatively related to HRQOL compared with the population-based sample.

The aggregated means of PA, NA, and PS are presented in Table 2 . Six participants showed no variance in PA, NA, or PS, and so data for these participants are not included in the following analyses. The mean correlation between momentary HRQOL and PA, NA, and PS was 0.35 (range −0.28 to 0.91), −0.22 (range −0.86 to 0.30), and −0.26 (range −0.90 to 0.30), respectively. The within-person correlations between the HRQOL and feelings (PA and NA) and PS for the total sample are displayed in Figure 2 . For most of the participants, the correlations with HRQOL were positive for PA (86%) and negative for both NA (75%) and PS (81%).

In Table 2 , the aggregated means and SDs of momentary HRQOL for the total group and the subgroups are presented. The SDs of momentary HRQOL per participant are displayed in Figure 1 . The mean of the within-person SDs was 5.64, with a range from 0.94 to 18.22. The mean SD at day 1 was 5.2 and decreased to 3.9 at day 6. A repeated-measures analysis of variance determined that there was a statistically significant difference between the mean SDs over the 6 days (F = 3.545; df = 4.417; P = 0.005). Post hoc tests using the Bonferroni correction revealed that there was a statistically significant difference only between day 1 and day 6. In participants with a higher mean momentary HRQOL, there was less variance in responses than in participants with a lower momentary HRQOL. This relation was confirmed by a linear regression that showed a significant negative relation of the mean and SD of the momentary HRQOL per participant (β = −0.388; P = 0.000; R= 0.150).

All participants who finished the briefing completed the ESM week. Most of the participants (76%) thought of their week as being representative of a normal week. Twenty-two percent of the participants found the PsyMate annoying, while 90% reported that it did not influence their mood, social interactions, or activities and 75% said that it did not influence their thoughts (for details on specific subsamples, see Appendix A in Supplemental Materials found at doi:10.1016/j.jval.2014.10.003 ). Most of the participants (92%) also reported that they were able to give a good representation of their experiences during the day. Fourteen percent of the participants found some of the questions unclear.

Data on the inclusion of participants and demographic characteristics of the total sample and subsamples are presented in Table 1 . The study information was sent to 550 participants. The most common reason for not wanting to participate was not interested in the study objective (n = 100), too burdensome (n = 57), not able to combine with work (n = 28), and other physical complaints (n = 28) (for further details, see Appendix A in Supplemental Materials found at doi:10.1016/j.jval.2014.10.003 ). The final sample included 139 participants, with 40 participants in the population-based sample, 27 in the psychological complaints sample, 40 in the tinnitus sample, and 32 in the somatic complaints sample. The mean age of the total sample was 50 years, and 50% were men.


This article reports on what is, to our knowledge, the first study that uses the ESM to obtain accounts of momentary HRQOL and compare these with retrospective HRQOL measures. The results will be discussed in the next paragraphs.


Finally, the relations between a retrospective global measure of HRQOL (EQ-VAS) and momentary HRQOL, feelings, and symptoms were examined. The multiple regression model that was fitted to the data revealed that if the interaction terms were added to the model, none of the momentary feelings and symptoms was significant predictors of EQ-VAS. This supports earlier findings that global reports of past health will rely more on beliefs (semantic memory) than on specific feelings and symptoms []. Momentary HRQOL was a significant predictor of EQ-VAS, which was expected because the framing of the questions was similar in both methods.

In this article, we focused only on momentary HRQOL and feelings and symptoms. ESM data, however, also hold information on contextual items that could look more in detail at the different dimensions of health in the retrospective questionnaires. For instance, is the mobility dimension as measured by the EuroQol five-dimensional questionnaire reflected by the different locations a person is at during the day as measured by the ESM. In addition, dimensions of HRQOL not included in the present study, such as level of independence, social relationships, and interaction with the environment, can be included in the beep questionnaire. These questions are beyond the scope of this article but need to be considered in future articles.