Correlation of the Patient Reported Outcomes Measurement Information System with Legacy Outcomes Measures in Assessment of Response to Lumbar Transforaminal Epidural Steroid Injections

BACKGROUND AND PURPOSE: The Patient Reported Outcomes Measurement Information System is a newly developed outcomes measure promulgated by the National Institutes of Health. This study compares changes in pain and physical function–related measures of this system with changes on the Numeric Rating Pain Scale, Roland Morris Disability Index, and the European Quality of Life scale 5D questionnaire in patients undergoing transformational epidural steroid injections for radicular pain. MATERIALS AND METHODS: One hundred ninety-nine patients undergoing transforaminal epidural steroid injections for radicular pain were enrolled in the study. Before the procedure, they rated the intensity of their pain by using the 0–10 Numeric Rating Pain Scale, Roland Morris Disability Index, and European Quality of Life scale 5D questionnaire. Patients completed the Patient Reported Outcomes Measurement Information System Physical Function, Pain Behavior, and Pain Interference short forms before transforaminal epidural steroid injections and at 3 and 6 months. Seventy and 43 subjects replied at 3- and 6-month follow-up. Spearman rank correlations were used to assess the correlation between the instruments. The minimally important differences were calculated for each measurement tool as an indicator of meaningful change. RESULTS: All instruments were responsive in detecting changes at 3- and 6-month follow-up (P < .0001). There was significant correlation between changes in Patient Reported Outcomes Measurement Information System scores and legacy questionnaires from baseline to 3 months (P < .05). There were, however, no significant correlations in changes from 3 to 6 months with any of the instruments. CONCLUSIONS: The studied Patient Reported Outcomes Measurement Information System domains offered responsive and correlative psychometric properties compared with legacy instruments in a population of patients undergoing transforaminal epidural steroid injections for radicular pain.

S pinal pain is one of the most common types of chronic pain worldwide. 1 The epidemiology of low back pain has been investigated comprehensively in adults. A recent epidemiologic review noted lifetime prevalence estimates ranging from 12.2% to 43%; annual prevalence estimates were 2.2% to 34%. 2 When there is both compression and inflammation of neural elements, low back pain may be accompanied by radicular pain. Patients with radicular pain tend to have poorer outcomes, consume more health care resources, and have greater disability than patients with back pain alone. Transforaminal epidural steroid injections (TFESI) have become a common intervention in the treatment of radicular pain; the procedure has shown efficacy in explanatory trials 3 and clinical effectiveness in large retrospective series. 4 A recent systematic literature review demonstrated a consensus of support for TFESIs, but historically, there have been conflicting reports regarding their efficacy. 5 Systematic literature reviews are made more challenging and their interpretation is confounded by the use of a host of different measurement tools including the Numeric Rating Pain Scale (NRS), Roland-Morris Disability Index (RMDI), Oswestry Disability Index, European Quality of Life scale 5D questionnaire (EQ-5D), finger-to-floor distance, and the Oswestry and Nottingham Health Profile Verbal Rating Scale. 6 Although the NRS is a familiar mechanism for measuring pain for most patients, it cannot assess the more complex construct of a patient's functional disability and overall quality of life. The RMDI as a measure of functional disability has 23 questions with dichotomized choices. The EQ-5D is a measure of quality of life and addresses 5 different domains of mobility, self-care, usual work, and pain, though its 3-choice format may seem insufficient to quantitate the complicated effects of pain on a patient's life. 7 The need for a multiplicity of questionnaires and the variety of options leads to both excessive patient burden in assessment and challenges in comparison among studies when different measurement tools have been chosen.
To overcome these shortcomings, which apply across many different fields of medicine, a new Patient Reported Outcomes Measurement Information System (PROMIS) has been developed. The PROMIS initiative is the National Institutes of Healthfunded effort to produce widely applicable standardized measurement tools, covering a variety of domains, which can be used across many disciplines while minimizing the patient burden. The PROMIS study group has developed short-form measures of multiple domains, which should perform well in a TFESI procedural population. The PROMIS measures currently require clinical testing to compare them with established measures in a variety of clinical populations with different characteristics (including type of pain). They need to be compared with established measures (ie, "legacy" measures) with respect to responsiveness, reliability, and validity. Currently, the NRS, 23-point RMDI, and EQ-5D are performed in our practice before TFESI and at 3-, 6-, 9-, and 12month follow-up as standard quality assurance measures. The follow-up measurements occur by machine-read forms; these are given to the patient at dismissal from the procedure area with instructions to complete them at the indicated times and return them via the postage-paid envelope provided. If the questionnaires are not returned, an attempt is made to reach patients via telephone to give them the option of sending back the completed forms or completing the data via a telephone interview. The PROMIS tools are designed to measure pain and functional and quality-of-life domains with less patient burden. The goal of this study was to compare the reliability and responsiveness of PROMIS short forms with the NRS, 23-point RMD, and EQ-5D after lumbar transforaminal epidural steroid injections.

MATERIALS AND METHODS
Our institutional review board approved this prospective Health Insurance Portability and Accountability Act-compliant study, and written consent of all study participants was obtained. Between May 2010 and June 2012, 200 patients were enrolled who had been evaluated by the Mayo multidisciplinary Spine Center and referred to the radiology pain-management practice for lumbar transforaminal epidural steroid injections for radicular pain with or without radiculopathy. Patients were considered for enrollment in the study if they had lumbar radicular pain unresponsive to conservative therapy and were able to answer the questions in English. We excluded patients who were unable to consent or cooperate, had myelopathy or progressive neurologic deficits, were using anticoagulant medication, had a systemic infection or local skin infection in the lumbar region, or were pregnant. In the first evaluation, following the physician's procedural explanation and consent process and after the study coordinator's explanation of how to answer the questions, patients completed the PROMIS short forms measuring 3 different domains relevant to their radicular pain, including Physical Function, Pain  Behavior, and Pain Interference. Patients who were able to complete the forms themselves did so; the study coordinator was available to assist if necessary. Two more sets of questionnaires were given to the patient at dismissal from the procedure area with instructions to complete them at 3 and 6 months and return them in prepaid, preaddressed envelopes. The PROMIS forms included the 10-question Physical Function Short Form, which is focused on the ability to perform various daily activities from self-care (bathing and dressing) to vigorous physical activities (running, strenuous sports) (On-line Appendix, Fig A). The 6-question Pain Interference Short Form is focused on pain interference with mental, physical, and social aspects of daily living (On-line Appendix, Fig B). The 7-question Pain Behaviors Short Form focuses on verbal, facial, and bodily expressions of pain (Online Appendix, Fig C). Completing each questionnaire takes approximately 2.5 minutes. Responses to PROMIS questions on a given short form were summed for raw scores; responses entered into the on-line PROMIS data base provided t-scores. The t-score scale has a mean score of 50 and an SD of 10 in the US general population. 8 For example, a person who has a PROMIS pain interference score of 70 is reporting adverse pain interference 2 SDs worse than the general population mean. Higher t-scores indicate greater levels of the construct being measured. Thus, for pain behavior and pain interference, higher scores reflect worse pain, whereas for physical function, higher scores indicate better functioning. Participants also rated the intensity of their pain in the past 24 hours by using a pain NRS from 0 to 10, with zero indicating no pain and 10 indicating the worst imaginable pain. The 23-item modified RMDI and 5-item EQ-5D, 2 widely used functional outcome scales, were also administered. The EQ-5D survey assesses 5 dimensions with the possible score range for each of the dimensions of 1-3, in which 1 ϭ no problems, 2 ϭ moderate problems, and 3 ϭ extreme problems. Each unique health state described by the instrument has an associated 5-digit descriptor ranging from 11111 for perfect health to 33333 for the worst possible state. The resulting descriptive system defines 243 (3 5 ) health states. 9

Power, Sample Size, and Statistical Analysis
The study was a prospective registry, and the sample size was justified before the start of the study on the basis of the estimated precision for the agreement of the PROMIS and legacy scales. These calculations supported a range of sample sizes and incorporated up to a 50% attrition rate during follow-up (the assumed survey nonresponse rate). Briefly, agreement between the 2 approaches of measuring pain was estimated to be Ͼ0.8. We wanted to rule out agreement Ͻ0.6. For sample-size planning purposes, 55% of the patients were assumed to have a decrease in disability as measured by the RMDI at 3 months. Furthermore, it was assumed that 50% (40%) of the patients would have a decrease (increase) on both the RMDI and the PROMIS scales. The remaining 10% were assumed discordant cases (response on either RMDI or PROMIS but not both). On the basis of these estimates, was estimated to be 0.80; and provided 100 completed assessments were available, the 95% CI for would span 0.682-0.918. This would provide sufficient precision to rule out the null value of 0.6. To account for drop-out, we administratively selected a sample size of 200 to be sufficient to describe the primary aim.
Statistical tests on secondary aims were performed on an exploratory basis. Categoric data are presented as counts and per-  centages. Continuous data are presented by using mean, median, SD, and range as appropriate. The minimally important differences were calculated as reliable change index ϫ 1.96 for the multi-item measurement tools by using baseline data. Reliable change index is calculated as ͌2 ϫ standard error of measurement. 8,10 The standard error of measurement is calculated as SD͌(1-r), where SD is the standard deviation of the sample and r is reliability. 10 Instrument reliability used in standard error of measurement calculations was estimated with Cronbach ␣ measured at baseline. Changes exceeding 2 points on the pain NRS were considered clinically meaningful. 7 The minimally important difference served as another means of comparing instruments. Specifically, we assessed whether the proportion of individuals identified as having experienced meaningful change (defined as a change greater than or equal to the minimally important difference) was similar across measures. All correlations presented are Spearman rank correlation coefficients; associated significant P values indicate nonzero correla-tion. Criteria for an adequate cross-sectional and longitudinal Spearman correlation were set at Ͼ0.5 and Ͼ0.3, respectively. 11 P values Յ .05 were statistically significant. Statistical comparisons of the model-based estimates were configured to test changes from baseline to 3 months, baseline to 6 months, and 3-6 months postprocedure. SAS 9.3 (SAS Institute, Cary, North Carolina) was used in all data analyses.

Study Participants
Two hundred patients met the enrollment criteria and underwent TFESIs. One patient canceled the research authorization after 1 day, leaving the cohort with 199 research-authorized patients. All patients were able to complete the forms without assistance. Demographic characteristics are presented in Table 1. The sample consisted of 98 (49.2%) men and 101 (50.8%) women. Subject ages ranged from 25 to 90 years (mean, 63.1 Ϯ 14.7 years). One hundred twenty-nine patients were lost to follow-up at the 3-month time point; and from the remaining 70 patients, 43 subjects replied at 6-month follow-up. At the 3-and 6-month time points, 7 and 10 patients completed the forms with a telephone interview versus 63 and 33 that were self-administered, respectively. This process left a cohort of 70/ 199 (35%) at 3 months following injection and 43/199 (22%) at 6 months following the procedure. Although our standard clinical quality assurance follow-up continued through 9 and 12 months, there was a continued decline in the rate of return of the outcomes forms, so that evaluation at these data points was not considered useful. Table 2 presents baseline and 3-and 6-month scores. All instruments were responsive to detect changes at 3-and 6-month follow-up (P Ͻ .0001). Mean scores for all domains demonstrated improvement at 3 and 6 months (eg, less pain, better physical function). Correlations between RMDI, EQ-5D, or NRS and PROMIS scores were significant in all cross-sectional measurements (P Ͻ .0001, P Ͻ .005, and P Ͻ .05 respectively ; Figs 1-3). Correlations between changes with time in PROMIS scores and changes with time both in EQ-5D and RMDI scores were significant at 3 and 6 months (P Ͻ .05; Figs 4 -6). Although there was significant correlation between changes in the PROMIS scores and the pain NRS from baseline to 3 months (P Ͻ .05), the changes from 3 to 6 months were not significantly correlated.  There were no significant correlations in changes from 3 to 6 months with any of instruments.

Numeric Rating Pain Scale, PROMIS T-Score, RMDI, and EQ-5D
On the basis of the minimally important difference calculation (1.96 ϫ reliable change index), patients who had changes of 5.2 points for the RMDI, 0.3 points for EQ-5D, 6.2 points for Physical Function, 4.5 points for Pain Interference, and 4.4 points for Pain Behavior were considered to have experienced meaningful improvement. At 3 and 6 months, the proportion of patients achieving improvement in the 3 PROMIS domains and EQ-5D and RMDI were, respectively, 1%-10% and 2%-5%. The improvements were much higher in pain NRS either at 3 (42%) or 6 (41%) months. The direction of change (decline versus improvement) at 3-month follow-up of RMDI and PROMIS Physical Function, Pain Interference, Pain Behavior, and EQ-5D was the same in 65 (93%), 32 (46%), 63 (91%), 69 (98.5%), and 58(83%) patients, respectively. At 6-month follow-up, proportionately more patients exhibited a decline in RMDI (25%) compared with those showing a decline in the PROMIS Physical Function (9%) or Pain Behavior (9%) domains. A similar proportion (27%) showed a decline in the PROMIS Pain Interference domain (Table 3).

DISCUSSION
Using a cohort of patients who underwent TFESI, we found moderate-to-high degrees of correlation among PROMIS Physical Function, Pain Interference, and Pain Behavior domains and legacy instruments. All PROMIS domains were moderately-to-highly responsive to change and correlated with the RMDI, EQ-5D, and NRS, which have been validated in patients with low back pain and lower extremity radicular pain. Correlation between PROMIS domains and RMDI was high at any cross-sectional or longitudinal measurement and was highest for PROMIS Physical Function. There was not a strong correlation between the EQ-5D and the PROMIS Physical Function domain, which was expected because the RMDI focuses on physical activity, while the EQ-5D simultaneously assesses mood and physical function. In assessing post-TFESI improvement with legacy instruments, we saw the greatest improvement in the NRS (42% had meaningful improvement), while there were few patients showing meaningful improvement in either legacy or PROMIS functional scales. This finding could reflect the advanced age of this study population or could be an artifact of the small sample size. That the measurements of the PROMIS domains remained concordant with the legacy instruments in no way undermines the primary study result.
Other recent studies have evaluated the PROMIS scales in focused clinical populations. Shahgholi et al 7 studied 50 patients undergoing vertebroplasty following osteoporotic compression fractures and showed a strong correlation between PROMIS physical function and RMDI. Fries et al 12 studied 451 patients with chronic rheumatoid arthritis and showed a strong correlation between PROMIS physical function and the Health Assessment Questionnaire or the Health Assessment Questionnaire Disability Index in patients with chronic rheumatoid arthritis. In our current study, correlations between RMDI and PROMIS scales were stronger than those of Shahgholi et al 7 and Fries et al. 12 We also observed good longitudinal correlation of change in scores.
This study thus provides additional, incremental validation of the PROMIS outcome measures in a clinical population of subjects undergoing a therapeutic intervention. As such validation studies accumulate, PROMIS methodology may achieve the goal of the National Institutes of Health of widespread use, enhancing comparability of studies within and across multiple fields of med-  icine. The PROMIS instruments have also been formulated for computer adaptive testing; this change could further reduce the testing burden.
The study has limitations, the primary one being the number of subjects lost to follow-up. This is a product of the study design, relying on subjects to return materials distant in time from the intervention; supplying the surveys at the time of the procedure may have also diminished the rate of return. No useful data could be obtained at 9-and 12-month follow-up due to the failure of participants to return the study materials. Because the object of the study was correlation of measurement instruments, not assessment of outcomes, this lack of feedback reduces our precision, but does not confound the results. The lost-to-follow-up data may have introduced bias into the correlation summary. The results of this study will provide data for power analysis of future, more robust correlative studies as PROMIS scales are applied to additional therapeutic interventions.

CONCLUSIONS
The PROMIS domains used in this study offered responsive and comparable psychometric properties to legacy instruments in a population of patients undergoing TFESI for radicular pain. The advantages of using PROMIS instruments are their ability to compare the results in a study cohort with the general US population, the ease of scoring, lesser patient burden while maintaining responsiveness and precision, and lack of licensure costs. As PROMIS instruments become more widely used, there will be the opportunity to compare the impact of disease burden and therapeutic interventions across medical specialties.