Different methods to assess quality of life from multiple follow-ups in a longitudinal asthma study

Different methods to assess quality of life from multiple follow-ups in a longitudinal asthma study

Journal of Clinical Epidemiology 57 (2004) 45–54 Different methods to assess quality of life from multiple follow-ups in a Longitudinal Asthma Study ...

322KB Sizes 0 Downloads 12 Views

Journal of Clinical Epidemiology 57 (2004) 45–54

Different methods to assess quality of life from multiple follow-ups in a Longitudinal Asthma Study Carol A. Mancuso*, Margaret G.E. Peterson Weill Medical College of Cornell University, New York Presbyterian Hospital, Hospital for Special Surgery, 535 East 70th Street, New York, NY 10021, USA Accepted 3 July 2003

Abstract Background and Objective: Serial measurements obtained during observational longitudinal studies offer the opportunity to describe the effects of chronic diseases on patient-centered outcomes such as quality of life. The purpose of this study was to assess serial Asthma Quality of Life Questionnaire (AQLQ) and SF-36 scores against a transition item using three methods of data analysis—final minus initial scores, maximum minus minimum scores, and regression line slopes through all scores. Methods: Using receiver operating characteristic (ROC) curves, each method of analysis was compared against patients’ responses to a global transition question about change in asthma status with responses dichotomized as “stayed the same or got worse” or “improved.” A total of 185 patients, mean age 41 ⫾ 11 years, 83% women, completed the AQLQ and SF-36 three to seven times at approximately 8month intervals over a mean of 24.8 ⫾ 3.9 months. For the AQLQ, all three methods of data analysis performed well against the transition item with ROC areas highest for the symptoms, activities, and the summary AQLQ scores (0.74–0.78). Results: Overall, ROC areas increased as the number of observations increased, ranging from 0.78 to 0.93 for the AQLQ summary score for patients with three to six or more assessments, respectively (P ⫽ .02). As part of the AQLQ, patients cited specific activities in which they were limited because of asthma. A total of 66 different activities were cited, including limitations in stair climbing, walking, interacting with others, sleeping, and working. In ROC analysis, serial measurements of these items also performed well against the transition item with areas ranging from 0.72 to 0.75 for all three methods of analysis. In contrast, ROC areas for the SF-36 Physical and Mental Component Summary scores were significantly lower, ranging from 0.59 to 0.66 compared to the AQLQ areas, indicating that the generic scale was less responsive than the disease-specific scale (P ⭐ .01). The three different methods of analysis also provided unique information about the cohort. The final minus initial analysis showed that 63% of patients had clinically important improvements, the maximum minus minimum analysis showed that over 90% of patients had fluctuations in scores that were clinically important, and the slope analysis showed that 79% of patients had an overall trend of improvement. Conclusions: This study described possible methods to analyze and present serial data. Additional techniques to assess and interpret serial longitudinal data are needed to comprehensively describe long-term effects of chronic diseases on quality of life. 쑖 2004 Elsevier Inc. All rights reserved. Keywords: Longitudinal; Quality of life; Transition item; Asthma; Receiver operating characteristic curves; Serial assessments

1. Introduction In clinical trials that measure quality of life outcomes, the effectiveness of a treatment is usually determined by comparing a preintervention assessment with a postintervention assessment [1–3]. Results are often interpreted in terms of clinically important differences, which are threshold values that are correlated with other indicators of change, or gold standards [4]. In some cases, the gold standards are transition items, which are long longitudinal assessments of change over time [5]. Transition items often are obtained from patients as an overall appraisal of the magnitude and direction * Corresponding author. Tel.: 212-746-1607; fax: 212-746-8965. E-mail address: [email protected] (C.A. Mancuso). 0895-4356/04/$ – see front matter 쑖 2004 Elsevier Inc. All rights reserved. doi: 10.1016/S0895-4356(03)00248-8

of change in clinical condition [6]. For example, patients are asked if they are better, worse, or the same, and then if applicable, how much better or worse. In this way, transition items parallel how physicians assess change in actual clinical practice [5–7]. However, transitions items have some limitations because they can be biased toward the most recent condition [5]. In observational studies there are no interventions other than time that demarcate a pre- and a poststate. Instead, there are serial observations over time that capture current states. These data are often analyzed by comparing the most recent observation with the initial observation [8]. However, if one is interested in a more comprehensive view of the data, it is possible to carry out other types of analyses. For example,

46

C.A. Mancuso, M.G.E. Peterson / Journal of Clinical Epidemiology 57 (2004) 45–54

comparing the maximum and minimum values provides information on the spectrum of variation over time. Alternatively, repeated measures analysis and generating slopes through all data points provide a way to incorporate all values into a single analysis. Assessing observational data by these additional techniques is particularly useful for chronic diseases such as asthma. This is so because the pattern of disease activity and the range of disability caused by the disease are important clinical outcomes over long periods of time [9]. The objective of this report was to assess serial Asthma Quality of Life Questionnaire (AQLQ) scores against a single transition item in an observational study with asthma patients who had three or more assessments during a 2- to 3-year period. AQLQ scores were analyzed with three methods: (1) within-patient final minus initial scores; (2) within-patient maximum minus minimum scores; and (3) slopes of withinpatient regression lines through all scores. An additional objective was to compare conclusions about longitudinal change in quality of life generated from these three different methods. Similar analyses also were carried out for serial SF-36 general health survey scores. The hypothesis was that each of these different analyses would provide unique information about the effects of asthma on health-related quality of life.

2.2. Asthma Quality of Life Questionnaire

2. Methods

2.3. SF-36 General Health Survey

2.1. Subjects

Patients also completed the SF-36 General Health Survey at each assessment [13]. The SF-36 is a 36-item survey that has been used widely in clinical studies with asthma patients [1,14,15]. Two summary scores can be calculated: a Physical Component Summary score, and a Mental Component Summary score [16]. A score of 50 for each Component corresponds to the mean score for the healthy United States population.

These analyses were conducted with data from a cohort study of patients with asthma. The study, described in detail previously, attempted to identify predictors of greater resource utilization and decline in health-related quality of life in a group of patients followed for 2 to 3 years in a tertiary care urban primary care practice [10]. This was a longitudinal observational study, there were no interventions administered by this study. Established patients who had a known diagnosis of asthma were identified from daily appointment schedules and were enrolled when they came for routine office visits with their physicians. Degree of asthma control at the time of screening was not part of the eligibility criteria. Patients were eligible if they were between 18 and 62 years old, were fluent in either English or Spanish, and had moderate asthma as defined by the National Heart, Lung, and Blood Institute’s Expert Panel on Asthma [11]. According to this definition, patients under treatment have moderate asthma if they require medications daily, but not oral corticosteroids daily. Patients were excluded if they had other pulmonary diagnoses or severe comorbidity. Patients completed a series of questionnaires, including measures of health-related quality of life, asthma characteristics, and psychosocial characteristics at approximately 6-month intervals over 2 to 3 years. Patients who completed a total of three or more assessments were included in this present analysis.

At each assessment patients completed the AQLQ, a 32item survey with documented validity, reliability, and responsiveness [1,4,12]. Compared to the transition item, which provides a global measure of change according to the patient’s own weighting system, the AQLQ provides information about specific areas of function and quality of life. A summary score, as well as scores for four domains— emotion, environment, symptoms, and activities—can be generated and can range from 1 (worst condition) to 7 (best condition). In addition, the first five items of the AQLQ ask patients to cite five activities in which they are limited because of asthma and to rate how limited they are in each activity. At every subsequent assessment patients rated their current limitations in these same five activities and a separate score was calculated for these activities. This “patientspecific” score, therefore, represented the unique limitations in daily activities caused by asthma for each individual patient. As part of its original testing, the authors of the AQLQ determined clinically important differences in scores between any two assessments [4]. Changes in scores of 0.5 to 1.0, 1.0 to 1.5, and ⬎1.5 are considered mild, moderate, and marked clinically important differences, respectively, for any domain and for the summary score. These same criteria were applied to the patient-specific score for this analysis.

2.4. Patient-reported transition item At the conclusion of the study, patients were asked to rate the amount and direction of change in their asthma. Patients were asked: “Compared to enrollment, how would you rate your asthma now? Is it: much better than it was, better than it was, about the same as it was, worse than it was, or much worse than it was?” Responses to this transition question served as the criterion variable for overall change during the study period. Patients were dichotomized into two groups based on their responses to this question, one group of patients who stayed the same or got worse, and another group who improved. 2.5. Comparing the transition item to an independent measure To support the use of the transition item as a measure of change over time, we compared the transition item to cumulative emergency department use, an independent variable

C.A. Mancuso, M.G.E. Peterson / Journal of Clinical Epidemiology 57 (2004) 45–54

that also reflects longitudinal clinical condition. Specifically, patients were contacted by telephone every 3 months throughout the study period and were asked how many emergency department visits for asthma they had had since we last contacted them. To help patients frame the time interval of interest, we cited specific holidays or other events that also occurred during each 3-month period. 3. Data analysis Means and standard deviations were calculated for all patients for all AQLQ score categories (patient-specific activities, domain, and summary) and for the Component Summary scores of the SF-36. Frequencies of responses to the transition question were calculated. Dichotomized responses to the transition question were then compared to the total number of emergency department visits for asthma during the study period using t-tests. 3.1. Final minus initial AQLQ scores Within-patient differences in scores were calculated for the final score minus the initial score for each domain of the AQLQ, as well as for the patient-specific activities and summary scores. Results were assessed according to established categories of clinically important differences as described above. The percent of patients in each category and the mean within-patient difference in scores for each category were calculated. The difference between final and initial scores was then compared to the transition item with receiver operating characteristic (ROC) curves and areas. ROC analysis relies on measures of sensitivity and specificity, and permits comparison of a scale’s performance against a criterion variable that represents the gold standard [17]. For this analysis, the criterion variable was the patient’s response to the transition question about overall change in status. Responses were dichotomized with “much worse,” “worse,” and “about the same” as “cases,” and “better,” and “much better” as “noncases.” ROC curves were then generated from pairs of sensitivity and 1-specificity measures for the patient-specific, domain, and summary scores [18]. Areas under the ROC curves were calculated with logistic regression analysis using “case” status as the dependent variable and each AQLQ score category as the independent variable [19,20]. Standard errors were calculated from the Wilcoxon statistic for ROC areas as outlined by Hanley and McNeil [19] and the equation of Delong [21], followed by calculation of 95% confidence intervals. In addition, ROC areas were also compared by the technique described by Hanley and McNeil [22] for correlated data where repeated measurements on each subject are made.

47

These calculations were not chronologically dictated, but rather reflected times during the study period when patients’ AQLQ scores were at their “worst” (minimum) and “best” (maximum). Within-patient “range” values were calculated as the maximum minus the minimum value for the patientspecific, domain, and summary scores. These results were also assessed according to established definitions of clinically important differences. The maximum minus minimum values were then similarly assessed with ROC curves and areas using the transition question as the criterion variable as described above. 3.3. Calculating slopes from all observations For each patient, regression lines were generated from all assessments for each score category. Beta coefficients along with their positive or negative signs were recorded as the slopes. For example, for each patient a regression line for the symptoms domain was generated from all symptoms scores obtained from that patient during the entire study period, and the magnitude and sign of the line’s slope were recorded. A slope with a magnitude of “0” indicated no change during the study period. A slope with a positive sign indicated an overall pattern of improvement and a slope with a negative sign indicated an overall pattern of deterioration. The slope values were then similarly assessed with ROC curves and areas as described above. In addition, independent areas were generated for patients who had three, four, five, or six or more assessments, and these were compared using the algorithm of Delong [19]. 3.4. Themes of activity limitations Throughout the study patients were asked about limitations in the same five activities they cited at enrollment. Responses to these items represented the patient-specific activities score. Standard qualitative techniques involving an iterative process were carried out to assemble these responses into broad categories or themes [23,24]. Percents of patients citing an activity in each category were calculated. 3.5. Analyses of SF-36 observations Similar analyses were done to generate ROC areas for the SF-36 Physical and Mental Component Summary scores for all three types of analysis (final minus initial, maximum minus minimum, and slope values) using the transition question as the criterion variable. This study was approved by the Committee on Human Rights in Research at our institution.

4. Results 3.2. Maximum minus minimum AQLQ scores We also calculated ranges of scores in order to measure the spectrum of variation in quality of life over time.

Of the 230 patients enrolled in the larger study between 1995 and 1997, 195 patients had three or more AQLQ and SF-36 assessments. Of these, 185 patients also answered the

48

C.A. Mancuso, M.G.E. Peterson / Journal of Clinical Epidemiology 57 (2004) 45–54

transition question and were included in this analysis. The mean age was 41 ⫾ 11 years and 83% were women. Patients were taking asthma medications daily, particularly inhaled corticosteroids (80%) and inhaled beta agonists (94%). Mean duration of asthma was 20 ⫾ 14 years; 50% had been hospitalized for asthma, and 62% had required oral corticosteroids for asthma prior to being enrolled in this study. Patients were followed for a mean of 24.8 ⫾ 3.9 months (range 17.0 to 36.2 months). The number of observations per patient ranged from three to seven, with a mean time interval of 8.9 ⫾ 3.2 months between observations (Table 1). AQLQ patient-specific, domain, and summary scores for all observations for all patients spanned the entire range of possible scores from 1 (worst condition) to 7 (best condition). Mean SF-36 scores were below the population norm of 50, especially for the Physical Component Summary score. In response to the transition question, 54% stated their asthma was better or much better, and 46% stated their asthma was the same, worse or much worse. The transition item was also compared to emergency department use. In total, 44% of patients had been in the emergency department for asthma during the study period. Among these patients, the mean number of emergency department visits was 2.8 ⫾ 2.5 for patients who reported they were better or much better and 6.5 ⫾ 10.7 for patients who reported they stayed the same, or were worse or much worse (P ⫽ .04). For all patients, the mean number of emergency department visits was 1.2 ⫾ 2.1 for patients who reported Table 1 Longitudinal study characteristics (N ⫽ 185) Variable

Value

Time in study (months) (mean ⫾ SD) Time between observations (months) (mean ⫾ SD) Number of observations (% patients) 3 4 5 ⭓6 Asthma Quality of Life Questionnaire Scores (mean ⫾ SD) (range)a Emotion domain Environment domain Symptoms domain Activities domain Patient-specific activities Summary SF-36 Scores (mean ⫾ SD) (range)a Physical component summary Mental component summary Patient-reported asthma status global transition item (% patients) Much better Better About the same Worse Much worse

24.8 ⫾ 3.9 8.9 ⫾ 3.2

a

4.1. Comparing the final minus initial values to the transition item The mean within-patient final minus initial values for the AQLQ were 1.13 ⫾ 1.76 for patient-specific activities, 1.20 ⫾ 1.80 for emotion, 1.26 ⫾ 1.73 for environment, 1.02 ⫾ 1.68 for symptoms, 1.02 ⫾ 1.45 for activities, and 1.03 ⫾ 1.43 for summary scores. These values were assessed according to the definitions of clinically important differences (Table 3). For the summary score, 63% of patients had a mild, moderate, or marked improvement, 27% had no change, and 10% had mild, moderate, or marked deterioration. For all domains, 56 to 69% of patients improved, 17 to 31% had no change, and 10 to 18% deteriorated. Final minus initial values were assessed against the transition item with ROC curves, and are shown in Figs. 1 and 2 for the summary score and the patient-specific score, respectively. The more closely an ROC curve approximates an inverted L-shape occupying the left upper corner, the more ideal a curve is considered to be. According to these visual representations, the final minus initial values performed well against the transition item. 4.2. Comparing the maximum minus minimum values to the transition item

36% 31% 25% 8%

3.87 3.87 4.03 4.13 4.08 4.02

improvement and 3.0 ⫾ 8.0 for patients who stayed the same or deteriorated (P ⫽ .04). The transition item, therefore, performed well against this independent measure of disease activity. In total, 66 different activities that were limited by asthma were cited in response to the first five items of the AQLQ. These items were grouped into seven categories (Table 2). Climbing stairs and daily activities that required mobility were the activities most frequently cited; however, activities involving others such as sexual activity and caring for others, especially children, were also frequently cited.

⫾ ⫾ ⫾ ⫾ ⫾ ⫾

1.49 1.41 1.35 1.32 1.33 1.29

(1.25–7.00) (1.00–6.92) (1.37–6.75) (1.69–6.93) (1.47–6.75) (1.58–6.79)

37.4 ⫾ 10.4 (17.8–58.8) 42.5 ⫾ 11.9 (17.2–63.7)

29% 25% 27% 15% 4%

Mean scores for all observations during entire study period.

The mean within-patient maximum minus minimum values were 2.34 ⫾ 1.28 for patient-specific activities, 2.41 ⫾ 1.40 for emotion, 2.31 ⫾ 1.32 for environment, 2.13 ⫾ 1.30 for symptoms, 1.94 ⫾ 1.09 for activities, and 1.87 ⫾ 1.18 for summary scores. Because these values represented the spectrum of best and worse scores, as expected they were greater than the final minus initial values. Over Table 2 Categories of patient-specific activities limited by asthma Categories

Percent of patients citing an activity in this category

Climbing stairs Housework/home maintenance Walking/mobility Activities involving others Sports/exercise Sleeping Working/school

73% 71% 69% 58% 56% 39% 15%

C.A. Mancuso, M.G.E. Peterson / Journal of Clinical Epidemiology 57 (2004) 45–54

49

Table 3 Final minus initial AQLQ values according to clinically important differences categories Clinically important differences categories Asthma Quality of Life Questionnaire Emotion domain % patients Mean range ⫾ SD Environment domain % patients Mean range ⫾ SD Symptoms domain % patients Mean range ⫾ SD Activities domain % patients Mean range ⫾ SD Patient-specific activities % patients Mean range ⫾ SD Summary % patients Mean range ⫾ SD

Marked deterioration ⭐ ⫺1.5

Moderate deterioration ⬎ ⫺1.5, ⭐ ⫺1.0

Mild deterioration ⬎ ⫺1.0, ⭐ ⫺0.5

No change ⬎ ⫺0.5, ⬍0.5

Mild improvement ⭓ 0.5, ⬍1.0

Moderate improvement ⭓ 1.0, ⬍1.5

Marked improvement ⭓ 1.5

5% ⫺2.40 ⫾ 0.55

4% ⫺1.13 ⫾ .18

5% ⫺.73 ⫾ .14

22% .03 ⫾ .26

13% .71 ⫾ .10

11% 1.17 ⫾ .15

40% 2.96 ⫾ 1.24

5% ⫺2.11 ⫾ .36

4% ⫺1.19 ⫾ .12

5% ⫺.63 ⫾ .13

17% .05 ⫾ .21

13% .61 ⫾ .13

13% 1.09 ⫾ .12

43% 2.83 ⫾ 1.18

6% ⫺2.19 ⫾ .63

2% ⫺1.21 ⫾ .08

5% ⫺.68 ⫾ .17

31% .03 ⫾ .27

12% .70 ⫾ .16

10% 1.21 ⫾ .10

34% 2.90 ⫾ 1.07

3% ⫺2.07 ⫾ .82

3% ⫺1.18 ⫾ .13

5% ⫺.71 ⫾ .16

28% ⫺.03 ⫾ .28

16% .73 ⫾ .12

10% 1.22 ⫾ .12

35% 2.64 ⫾ .86

4% ⫺2.19 ⫾ .60

7% 1.24 ⫾ .17

7% ⫺.75 ⫾ .11

23% .01 ⫾ .22

10% .67 ⫾ .11

12% 1.22 ⫾ .16

37% 3.01 ⫾ 1.04

3% ⫺1.97 ⫾ .67

2% ⫺1.15 ⫾ .13

5% ⫺.71 ⫾ .13

27% .07 ⫾ .26

17% .67 ⫾ .12

15% 1.23 ⫾ .15

31% 2.73 ⫾ .98

90% of patients had fluctuations in scores that were clinically important in each score category (Table 4). Only 3 to 6% of patients had no change and, by definition, none of the patients deteriorated by this analysis. Maximum minus minimum values were also compared against the transition item with ROC curves and are shown in Figs. 1 and 2. These curves also occupy the left upper corner and approximate the inverted L-shape. 4.3. Comparing the slope values to the transition item Regression lines and their slopes were generated for each patient for the patient-specific, domain, and summary scores (Table 5). Approximately 70% of patients had positive slopes, indicating an overall pattern of improvement; 28% had negative slopes, indicating an overall pattern of deterioration; and the remainder had a slope of 0, indicating no change. Mean slopes were also calculated according to the original five possible transition item responses. In general, slopes went from negative to positive and increased in magnitude as the transition item responses went from much worse to much better (P ⭐ .0008 for all comparisons). ROC curves for slope values and the transition item are shown in Figs. 1 and 2 for the summary and patient-specific scores, respectively. These curves also occupy the left upper corner and approximate the inverted L-shape. To determine if one ROC curve is more closely related to the criterion variable than another curve, it is necessary to compare areas under the curves. An area of 1.00 is considered to be perfect and an area of 0.50 is considered to be no better than a random occurrence. ROC areas for all three methods of analysis are shown in Table 6. In general, the AQLQ performed well against the criterion variable with

most areas approximately 0.70 and greater. For each analysis, the summary values and the activities and symptoms domains had the greatest areas, and the emotion and environment domains had the lowest areas. However, in comparing areas it should be noted that for most domains 95% confidence intervals overlapped and therefore differences in ROC areas were not statistically significant. Areas were also compared according to the technique described by Hanley and McNeil [22]. Across a domain, there were only small differences in areas among the three methods of analysis, and these differences were not statistically significant (P ⬎ .05, Table 6). Thus, no one method was more responsive than the others, and all performed well against the criterion variable. ROC curves were also generated for subgroups of patients according to the total number of observations obtained. Specifically, separate curves were generated for patients with three, four, five, or six or more observations. Sample curves for the AQLQ summary score slope values are shown in Fig. 3. In general, as the number of observations increased, the better the curve approximated the ideal inverted L-shape in the left upper corner. The areas for these curves were 0.78, 0.80, 0.71, and 0.93, respectively (P ⫽ .02). Pair wise comparisons revealed that the area under the ROC curve for patients with five observations was significantly less compared to the area under the curve for patients with six or more observations (P ⫽ .009). A similar pattern was found for all the other domain and summary scores analyzed by the slope, maximum/minimum and final/initial methods. 4.4. Comparing the SF-36 to the transition item Similar analyses were also carried out for serial SF-36 assessments. Patients completed the SF-36 at the same time as

50

C.A. Mancuso, M.G.E. Peterson / Journal of Clinical Epidemiology 57 (2004) 45–54

Fig. 1. Receiver operating characteristic (ROC) curves for the Asthma Quality of Life Questionnaire summary score and the patient-reported transition question. Patients who stayed the same or got worse were considered “cases,” and patients who improved were considered “noncases.” Symbols have been placed on the lines to distinguish among curves and do not represent actual points.

the AQLQ; therefore, the number of SF-36 assessments and the time intervals between assessments were identical to those shown above for the AQLQ. The mean within-patient final minus initial values were 0.8 ⫾ 9.8 for the Physical Component Summary (PCS) score and 3.1 ⫾ 13.4 for the Mental Component Summary (MCS) score. The mean within-patient maximum minus minimum values were 12.7 ⫾ 7.7 and 17.9 ⫾ 9.5 for the PCS and MCS, respectively. Fifty-four percent of patients had positive PCS slopes and 46% had negative PCS slopes. Fifty-seven percent of patients had positive MCS slopes and 43% had negative MCS slopes. PCS and MCS values were similarly compared to the transition item with ROC areas (Table 6). In general, PCS and MCS ROC areas were above the 0.50 threshold but were less than the AQLQ areas. Specifically, compared to the AQLQ summary areas, the PCS areas were lower than the areas for the slope values (P ⬍.0001), the maximum minus minimum values (P ⫽ .0009), and the final minus initial values (P ⫽ .01). Therefore, the general health SF-36 did not perform as well against the criterion variable as the diseasespecific AQLQ.

5. Discussion In this analysis we used three methods to report change in quality of life from serial observations. We found that each method—final minus initial scores, maximum minus minimum scores, and regression line slopes through all scores—performed well in ROC analysis against a transition item used as the criterion variable. We analyzed the data by these different methods to better understand the variable longitudinal effects of asthma, a chronic condition that can wax and wane over time. Each method provided a clinically relevant view of the data while at the same time providing unique information. For example, for the AQLQ summary score, the maximum minus minimum value was 1.87, above the 1.5 threshold considered a marked clinically important change. In contrast, the final minus initial value was 1.03, corresponding to a moderately important change. In addition, according to the final minus initial analysis, 63% of patients improved and 10% deteriorated during the study period, while according to the slope analysis, 79% of patients showed a trend of improvement and 21% showed a trend of deterioration.

C.A. Mancuso, M.G.E. Peterson / Journal of Clinical Epidemiology 57 (2004) 45–54

51

Fig. 2. Receiver operating characteristic (ROC) curves for the Asthma Quality of Life Questionnaire patient-specific activities score and the patient-reported transition question. Patients who stayed the same or got worse were considered “cases,” and patients who improved were considered “noncases.” Symbols have been placed on the lines to distinguish among curves and do not represent actual points.

Given that different methods generate somewhat different findings, which method should be used when analyzing observational longitudinal data? Which method provides the best summary of change over time? Deciding which method to use to report change depends on the clinical question. The final minus initial analysis compares clinical situations at two points in time that are dictated by the researcher or clinician. In observational studies where there are no specific interventions around which to compare, the initial point is when the observation starts and the final point is when the observation stops. For diseases that have a predictable progression over time (e.g., gradual worsening), this type of analysis provides a useful measurement of clinical course. However, for diseases that have a waxing and waning pattern, this analysis omits details of variability. The slope analysis, on the other hand, includes interval variations. By incorporating all data points simultaneously, the slope method provides a more comprehensive view of longitudinal clinical course. This information is useful if the investigator or clinician wants to know whether there is a long-term trend toward improvement or deterioration. In contrast, the maximum and minimum analysis provides information on

how wide the variation actually is. This analysis is independent of physician or investigator defined initial and final points. Instead, it considers when the adverse effects of disease on quality of life are most and least pronounced. This information is clinically relevant because wide fluctuations in maximum and minimum values indicate the condition is not well controlled and requires closer evaluation and management. We previously reported on the cross-sectional or discriminative properties of the AQLQ and the SF-36 in our cohort [25]. Using similar ROC analysis and a cross-sectional global measure, we found both scales performed well in discriminating current disease activity, with a slight trend for the AQLQ to have higher ROC areas than the SF-36. In the present analysis, however, the AQLQ had statistically significant higher ROC areas, and therefore was more responsive to change in asthma condition than the SF-36. One reason for this difference may be that the SF-36 combines the effects of comorbidity and therefore reflects adverse consequences from more than one disease. These consequences may be more pronounced in longitudinal vs. cross-sectional studies, and may explain why we observed

52

C.A. Mancuso, M.G.E. Peterson / Journal of Clinical Epidemiology 57 (2004) 45–54

Table 4 Maximum minus minimum AQLQ values according to clinically important differences categories Clinically important differences categories Asthma quality of life questionnaire Emotion domain % patients Mean range ⫾ SD Environment domain % patients Mean range ⫾ SD Symptoms domain % patients Mean range ⫾ SD Activities domain % patients Mean range ⫾ SD Patient-specific activities % patients Mean range ⫾ SD Summary % patients Mean range ⫾ SD

No change ⬎ 0, ⬍ 0.5

Mild improvement ⭓ 0.5,⬍ 1.0

Moderate improvement ⭓ 1.0, ⬍ 1.5

Marked improvement ⭓ 1.5

4% 0.18 ⫾ 0.13

12% 0.74 ⫾ 0.11

14% 1.22 ⫾ 0.17

70% 3.06 ⫾ 1.15

3% 0.21 ⫾ 0.10

8% 0.65 ⫾ 0.13

15% 1.17 ⫾ 0.12

74% 2.82 ⫾ 1.16

6% 0.36 ⫾ 0.08

15% 0.74 ⫾ 0.16

17% 1.23 ⫾ 0.12

62% 2.89 ⫾ 1.08

6% 0.38 ⫾ 0.11

16% 0.75 ⫾ 0.14

18% 1.20 ⫾ 0.14

60% 2.60 ⫾ 0.84

4% 0.32 ⫾ 0.11

9% 0.74 ⫾ 0.15

18% 1.23 ⫾ 0.15

69% 2.95 ⫾ 1.04

8% 0.35 ⫾ 0.12

17% 0.71 ⫾ 0.13

20% 1.22 ⫾ 0.14

55% 2.71 ⫾ 0.94

in the slope analysis that 79% of patients improved on the AQLQ but only 56% improved on the SF-36. Another possible reason for the difference between scales is that some SF-36 questions may not be applicable to asthma, specifically the pain questions. In addition, because patients had asthma for a mean of 20 years, many of them may have already successfully made accommodations in role functions, and therefore did not have wide variations in these SF-36 domains. At the other end of the spectrum from the general health scale is the patient-specific measure. Although not formally segregated as a patient-specific domain, the first five items of the AQLQ ask patients to identify five activities in which they are limited because of asthma. When followed over time, these items provide an ideal means of monitoring how

asthma impacts aspects of life that are important to individual patients. In our study we found that in addition to the anticipated limitations in climbing stairs and participating in sports and work-related activities, many patients also reported limitations in activities that involved interacting with others, such as caring for children. It was interesting to find that these five patient-specific items performed as well as the summary and domain scores in ROC analysis against the transition item. In our study we also found that shorter time intervals between observations may result in better association with the transition item. For example, we found that AQLQ scores were more closely related to the transition item as the number of observations increased. Overall, for all three methods of analysis, patients with more observations had greater ROC

Table 5 Characteristics of regression line slopes Direction of slope Asthma Quality of Life Questionnaire

Percent “⫹” slope

Percent “0” slope

Percent “⫺” slope

Emotion domain Mean slope Environment domain Mean slope Symptoms domain Mean slope Activities domain Mean slope Patient-specific activities Mean slope Summary Mean slope

72%

2%

28%

78%

1%

21%

71%

1%

28%

75%

0%

25%

72%

2%

26%

79%

0%

21%

Mean slopes according to transition item responses Much worse n⫽8

Worse n ⫽ 28

Same n ⫽ 49

Better n ⫽ 46

Much Better n ⫽ 54

0 ⫾ .040

.019 ⫾ .057

.028 ⫾ .064

.062 ⫾ .072

.077 ⫾ .088

.0002

.024 ⫾ .036

.033 ⫾ .073

.029 ⫾ .055

.071 ⫾ .082

.077 ⫾ .065

.0008

⫺.010 ⫾ .026

.003 ⫾ .035

.016 ⫾ .055

.059 ⫾ .076

.080 ⫾ .069

⬍.0001

⫺.023 ⫾ .021

.007 ⫾ .041

.026 ⫾ .044

.055 ⫾ .066

.076 ⫾ .057

⬍.0001

⫺.012 ⫾ .027

0 ⫾ .053

.033 ⫾ .059

.060 ⫾ .070

.089 ⫾ .071

⬍.0001

⫺.007 ⫾ .018

.011 ⫾ .033

.023 ⫾ .045

.059 ⫾ .066

.078 ⫾ .060

⬍.0001

“⫹” slope indicates general improvement. “0” slope indicates no change. “⫺” slope indicates general deterioration. a Values are for comparisons across the five possible transition item responses.

Pa

C.A. Mancuso, M.G.E. Peterson / Journal of Clinical Epidemiology 57 (2004) 45–54

53

Table 6 Areas under receiver operating characteristic (ROC) curves Areas based on dichotomized transition item and:

Asthma Quality of Life Questionnaire Emotion domain Environment domain Symptoms domain Activities domain Patient-specific activities Summary* SF-36 Physical component summarya Mental component summary a

Final minus initial values

Regression line slope

Maximum minus minimum values

Area

95% CI

Area

95% CI

Area

95% CI

0.70 0.68 0.77 0.75 0.75 0.77

0.62, 0.60, 0.70, 0.68, 0.67, 0.70,

0.77 0.76 0.84 0.82 0.82 0.84

0.69 0.68 0.77 0.76 0.74 0.78

0.61, 0.60, 0.70, 0.69, 0.68, 0.71,

0.76 0.76 0.84 0.83 0.82 0.84

0.69 0.63 0.74 0.74 0.72 0.75

0.61, 0.54, 0.67, 0.66, 0.65, 0.68,

0.66 0.60

0.58, 0.74 0.51, 0.68

0.59 0.59

0.50, 0.67 0.51, 0.67

0.59 0.58

0.51, 0.68 0.58, 0.71

0.76 0.71 0.82 0.81 0.79 0.82

PCS areas differed compared to AQLQ summary areas for all methods of analysis (P ⭐ .01).

areas. This may be because patients with more observations became better attuned to their asthma and how to report their status in questionnaires, or they were better able to

recall their condition because the time interval between assessments was shorter. The number of observations also is important when considering whether to use the slope

Fig. 3. Receiver operating characteristic (ROC) curves for the Asthma Quality of Life Questionnaire summary score slope values according to the number of observations. In general, curves more closely approximated the ideal L-shape occupying the left upper corner as the number of observations increased. Symbols have been placed on the lines to distinguish among curves, and do not represent actual points.

54

C.A. Mancuso, M.G.E. Peterson / Journal of Clinical Epidemiology 57 (2004) 45–54

method of analysis. For example, it may not be meaningful to use slopes if there are many missing data points or if there are markedly unequal numbers of data points at different time intervals. This study has several limitations. First, patients in this study were followed in an urban primary care practice, and therefore, these results may not be generalizable to patients in other settings. Second, follow-ups were conducted when patients came for routine office visits with their physicians, and variations in follow-up time intervals were not included in the analysis. Third, in the absence of a gold standard to measure change in quality of life [26], we used a patientreported transition item. Although also used in other asthma studies [3,6] transition items rely on patient recall, which has been shown to be subject to recollection error [27]. Similarly, our use of patient-reported emergency department visits also was subject to recollection error. Fourth, we did not have physiologic data in this study, which would have provided independent measures of disease activity over time. There are currently no widely agreed upon methods to represent multiple quality-of-life follow-ups in observational studies with asthma patients. In our study, we found that assessing the AQLQ according to final minus initial scores, maximum minus minimum scores, and slopes of regression lines through all scores performed well compared to a global patient-reported transition item. In addition, each type of analysis provided unique information about the cohort. Longitudinal studies offer researchers the opportunity to study the impact of chronic diseases over prolonged periods of time. It is useful, therefore, to develop methods to report and interpret serial longitudinal data to comprehensively describe long-term effects on patient-centered outcomes, such as quality of life.

Acknowledgments This project was supported by a Robert Wood Johnson Foundation Generalist Physician Faculty Scholar’s Award to Dr. Mancuso. The authors thank Dr. B. Robert Meyer and the attendings and housestaff of the Cornell Internal Medicine Associates for their participation.

References [1] Juniper EF, Guyatt GH, Ferrie PJ, Griffith LE. Measuring quality of life in asthma. Am Rev Respir Dis 1993;147:832–8. [2] Rutten-van Molken MPMH, Custers F, van Doorslaer EKA, Jansen CCM, Heurman L, Maesen FPV, Smeets JJ, Bommer AM, Raaijmakers JAM. Comparison of performance of four instruments in evaluating the effects of salmeterol on asthma quality of life. Eur Respir J 1995;8:888–98. [3] Juniper EF, Buist AS. Health-related quality of life in moderate asthma. Chest 1999;116:1297–303. [4] Juniper EF, Guyatt GH, Willan A, Griffith LE. Determining a minimal important change in a disease-specific quality of life questionnaire. J Clin Epidemiol 1994;47:81–7.

[5] Guyatt GH, Norman GR, Juniper EF, Griffith LE. A critical look at transition ratings. J Clin Epidemiol 2002;55:900–8. [6] Fischer D, Stewart AL, Bloch DA, Lorig K, Laurent D, Holman H. Capturing the patient’s view of change as a clinical outcome measure. JAMA 1999;282:1157–62. [7] Gill TM, Feinstein AR. A critical appraisal of the quality of qualityof-life measurements. JAMA 1994;272:619–26. [8] Oga T, Nishimura K, Tsukino M, Sato S, Hajiro T, Mishima M. Comparison of the responsiveness of different disease-specific health status measures in patients with asthma. Chest 2002;122:1228–33. [9] Richards JM, Hemstreet MP. Measures of life quality, role performance and functional status in asthma research. Am J Respir Crit Care Med 1994;149:S31–9. [10] Mancuso CA, Rincon M, McCulloch CE, Charlson ME. Self-efficacy, depressive symptoms, and patients’ expectations predict outcomes in asthma. Med Care 2001;39:1326–38. [11] Guidelines for the Diagnosis and Management of Asthma. National Heart, Lung and Blood Institute. J Allergy Clin Immunol 1991;88:435. [12] Juniper EF, Guyatt GH, Epstein RS, Ferrie PJ, Jaeschke R, Hiller TK. Evaluation of impairment of health-related quality of life in asthma: development of a questionnaire for use in clinical trials. Thorax 1992;47:76–83. [13] Stewart AL, Hays RD, Ware JE. The MOS Short-form General Health Survey: reliability and validity in a patient population. Med Care 1988;26:724–32. [14] Bousquet J, Knani J, Dhivert H, Richard A, Chicoye A, Ware JE Jr, Michel FB. Quality of life in asthma: internal consistency and validity of the SF-36 questionnaire. Am J Respir Crit Care Med 1994; 149:371–5. [15] Ried LD, Nau DP, Grainger-Rousseau TJ. Evaluation of patient’s health-related quality of life using a modified and shortened version of the Living with Asthma Questionnaire (ms-LWAQ) and the medical outcomes study, Short-Form 36 (SF-36). Qual Life Res 1999;8:491–9. [16] Ware JE, Kosinski M, Bayliss MS, McHorney CA, Rogers WH, Raczek A. Comparison of methods for scoring and statistical analysis of SF-36 health profile and summary measures: summary of results from the Medical Outcomes Study. Med Care 1995;33:AS264–79. [17] Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures: statistics and strategies for evaluation. Controlled Clin Trial 1991;12:142S–58S. [18] SAS User’s Guide. Statistics, version 5 edition. Cary NC: SAS Institute Inc.; 1985. [19] Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:29–36. [20] Stata Reference Manual. Release 7. College Station, TX: Stata Corporation; 2001. [21] DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two of more correlated receiver operating curves: a nonparametric approach. Biometrics 1998;44:837–45. [22] Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983;148:839–43. [23] Strauss A, Corbin J. Basics of qualitative research. 2nd ed. Thousand Oaks, CA: Sage Publications Inc.; 1998. [24] Berkwitz M, Inui TS. Making use of qualitative research techniques. J Gen Intern Med 1998;13:195–9. [25] Mancuso CA, Peterson MGE, Charlson ME. Comparing discriminative validity between a disease-specific and a general health scale in patients with moderate asthma. J Clin Epidemiol 2001;54:263–74. [26] Guyatt GH, Feeny DH, Patrick DL. Measuring health-related quality of life. Ann Intern Med 1993;118:622–9. [27] Mancuso CA, Charlson ME. Does recollection error threaten the validity of cross-sectional studies of effectiveness? Med Care 1995; 33:AS77–88.