Assessing the reliability of the short form 12 (SF-12) health survey in adults with mental health conditions: a report from the wellness incentive and navigation (WIN) study – Health and Quality of Li

Study cohort

The WIN project is a three-year longitudinal randomized pragmatic clinical trial funded by the Center for Medicare & Medicaid Services’ Medicaid Incentives for the Prevention of Chronic Conditions portfolio [4]. The WIN project examines the comparative effectiveness of personal navigators, motivational interviewing (MI), and a flexible wellness expense account on health care costs, cardiovascular risk factors, physical health, and HRQOL among individuals in Medicaid with co-occurring physical and mental health conditions or serious mental illness (SMI) or both, relative to usual care provided by a specialized Medicaid Managed Care Program for individuals with disability. The design of the WIN study has been described elsewhere [5]. In Brief, we recruited a total of 1663 participants in the study. We randomized participants in the Harris (Houston, Texas) service delivery area (SDA) to either a control group (n = 630) who received regular Medicaid managed care or an intervention group (n = 629) with personal navigators and a flexible wellness expense account. The Harris SDA was selected because it is where the STAR+PLUS program began, with sufficient infrastructure, experience, and stability to conduct a pragmatic clinical trial.

In order to evaluate the presence or lack of Hawthorne effect [11, 12], as well as to increase generalizability based on the comparison between the control and comparison groups, a random sample of 404 enrollees in STAR+PLUS Medicaid Managed Care program residing were recruited from the Nueces and Bexar service areas instead of the Harris service area as a comparison group. The comparison group met the same criteria as the control and intervention groups except for the location of the participants.

Among the recruited participants, 1587 of them had complete data on all twelve items of the SF-12 questionnaire that are required for computing the inter-item correlations. We only included the control group in the longitudinal test-retest analysis since the intervention may affect SF-12 scores. The accumulative loss-to-follow-up rate was 12% at the end of study year 1, 17% for year 2 and 24% for year 3. In this assessment of the reliability of SF-12 health survey, we pooled the baseline data in intervention, control, and comparison group to yield larger and more heterogeneous sample to improve the generalizability of the results.

Inclusion and exclusion criteria

Since WIN project is a pragmatic trial, our goal is to provide evidence for adopting the intervention to the real world Medicaid population with mental or co-occurred physical and behavioral conditions. The detailed diagnostic criteria with detailed ICD-9 codes for all included/excluded co-morbidities for individuals in the WIN study was published previously [4]. In brief, eligibility for the WIN trial included the presence of a serious mental illness (SMI) diagnosis (e.g. schizophrenia, bipolar disorder, major depressive disorder) or a behavioral health diagnosis (e.g., anxiety, depression, substance use disorder) coupled with a chronic physical health diagnosis (e.g. diabetes, Chronic Obstructive Pulmonary Disease (COPD)) or a combination of all three, of sufficient severity that the individual was disabled and receiving supplemental security income. We used Medicaid enrollment files linked to health care claims and encounter data to identify individuals meeting the eligibility criteria, and contacted them by letter and phone. We excluded members with a diagnosis of dementia, Alzheimer’s disease, or intellectual disability due to concerns about impairment or limitations in understanding the program benefits. We did not collect medical treatment information from electronic health records from the participants. All participants provided verbal consent prior to participation.

Instrument

The SF-12v2 is a health-related quality-of-life questionnaire consisting of twelve questions that measure eight health domains to assess physical and mental health. Physical health-related domains include General Health (GH), Physical Functioning (PF), Role Physical (RP), and Body Pain (BP). Mental health-related scales include Vitality (VT), Social Functioning (SF), Role Emotional (RE), and Mental Health (MH). The instrument has been validated across a number of chronic diseases and conditions [9, 13,14,15,16]. We administered the SF-12v2® annually by telephone survey to WIN study participants for three years. For each participant, we then calculated two summary scores of the SF-12v2®—physical and mental health—using the weighted means of the eight domains.

Statistical methods

The power and sample size calculation for the WIN study was reported previously [5]. We did a post hoc power analysis to ensure we have sufficient samples to assess the test-retest correlation of the instrument within a year. With 417 subjects, we had 94% power to detect a Pearson’s correlation coefficient of 0.7 when the correlation coefficient under the null hypothesis is 0.60 using a two-sided test with the alpha level of 0.05.

We reported baseline demographics as mean ± SD for continuous variables or n (%) for categorical variables. We followed the method described in the SF12v2® manual to compute the score for each domain as well as the physical and mental composite scores [17]. Before conducting correlation analyses, we computed residuals for all eight scales using general linear model adjusting for age, gender (male versus female), race/ethnicity (white, black and Hispanic) and clinical risk groups (CRGs). The 3 M CRG is a classification system that uses standard claims data to group individuals into one of 9 health status categories, from healthy to catastrophic conditions [18]. Given that the population all had one or more chronic conditions, the CRG categories were collapsed into three chronic condition categories by combining category 1–4 as the minor, category 5 as the moderate and category 6–9 as severe chronic conditions. We compared the CRG status between race/ethnicity categories to assess whether the overall health status of the participants differs among racial/ethnic groups. We assessed internal consistency of physical and mental composite scores (PCS and MCS) using Mosier’s formula [19] as well as Pearson’s correlations between the eight scales in all patients. For the scales measured by two items, we tested split-half reliability using ICCs followed by the Spearman-Brown correction in all the respondents [20].

The original purpose of the WIN study was not to measure test-retest reliability, but to assess the effectiveness of the intervention. We conducted the retests of SF-12 annually instead of weekly or monthly across three study years, which allows us to observe the long-term decay in the reliability of SF-12 in the WIN population between any of the four time points. We used correlations among three years to assess the longitudinal decay in the reliability of SF-12 in the control group for all the scales as well as the composite scores. For each scale, we also computed the ICCs for the four repeated assessments at baseline, year 1, 2 and 3, using a mixed model (PROC MIXED) with REML estimation and Kenward-Roger approximation, adjusting for age, sex (male vs. female), race/ethnicity (Hispanic, non-Hispanic white, non-Hispanic black), and CRG. We conducted all analyses using SAS version 9.4 (SAS Institute, Cary, NC), which is considered statistically significant when P-value ≤0.05.