This article appears in the following Rheumatology issue: Update in systemic sclerosis [View the issue table of contents]
Outcome measures in rheumatologic clinical trials and systemic sclerosis
1Geffen School of Medicine, University of California, Los Angeles, CA, USA.
Correspondence to: D. E. Furst, 1000 Veteran Ave. Rehab Center Room 32-59, Los Angeles, CA 90095-1670, USA. E-mail: defurst{at}mednet.ucla.edu
| Abstract |
|---|
|
|
|---|
OMERACT (Outcome Measures in Rheumatologic Clinical Trials) is a loose organization of rheumatologists, epidemiologists and statisticians whose aim is to improve measurements in the rheumatic diseases. In this context, some SSc measures of response have been found to be valid: the modified Rodnan skin score, the Raynaud's condition score, the forced vital capacity as part of pulmonary function tests, right heart catheterization haemodynamics, serum creatinine, blood pressure and complete blood counts in scleroderma renal crisis and serum creatine phosphokinase as a measure of muscle disease in SSc. Other measures are being tested and have nearly been validated, including the gastrointestinal questionnaire in SSc. Finally, some measures have been found wanting or are not fully tested—in this case for pulmonary arterial hypertension, where effort is presently being focused. These include the echocardiogram, high-resolution CT scan of the lungs, pulmonary function tests, 6-min walking test and MRI.
KEY WORDS: Scleroderma, Outcome measures in rheumatologic clinical trials, Lung, Combined indices
| Introduction |
|---|
|
|
|---|
The objectives of this article include a description of OMERACT (Outcome Measures in Rheumatologic Clinical Trials) and how it functions, a description of how OMERACT principles can be applied to SSC, and finally some examples of how OMERACT principles have been applied to SSC.
| OMERACT and how it functions |
|---|
|
|
|---|
OMERACT is a loose organization or network of rheumatologists, epidemiologists and statisticians who meet every 2 yrs [1]. The meeting includes
300 individuals interested in improving the measurements used in clinical trials of rheumatic diseases. It tries to ensure that valid measures of response are developed and used. It does so by advising others in the development of methods for this validation but it does not seek to take credit in any results or publications. Because OMERACT principles usually require extremely careful and thorough research and analysis, regulatory agencies in North America and Europe tend to use the results of these efforts. The approach used at OMERACT is to examine the truth, discrimination and feasibility of measures by testing their face validity, content validity, construct/criterion validity, divergent validity, reproducibility, sensitivity to change and feasibility.
Face validity is defined by whether or not a particular measure makes sense. For example, the face validity of a method to measure skin thickness using ultrasound makes sense as ultrasound is able to measure the thickness of numerous tissues; hence, its use to measure skin thickness is logical.
Content validity requires that a measure cover the whole range of possibilities within a given disease or disease state. Thus, for example in SSc, a measure should be validated in patients with both limited and diffuse disease. If only diffuse disease were being measured, both late and early diffuse disease should be included in the validation procedure.
Construct/criterion validity reflects a gold standard. Frequently, unfortunately, no gold standard exists so that one measure of construct/criterion validity has evolved to reflect the closeness with the expectations of investigators. As this can be self-fulfilling, an additional requirement within this aspect of validation is that a new measure should have both convergent and divergent validity. Convergent validity requires agreement with other accepted methods. Here, for example, validation of the scleroderma HAQ (SHAQ) disability index (DI) should agree with a patient global visual analogue scale (VAS) or aspects of the SF-36. Divergent validity requires the ability to differentiate between patient groups. Thus, for example, one should be able to tell the difference between lcSSc and dcSSc when doing skin scores.
Reproducibility requires stability when a measure is done repeatedly. For example, when a serum creatinine is performed twice several days or 1 week apart, the value should be reproducible. When applied to the skin score, the reproducibility when a single investigator performs the skin score on a patient twice, 1 week apart, is a measure of its reproducibility. It is clearly better when done by one individual than when done, even at the same time, by two investigators. For example, within-investigator reliability (intra-investigator) is
4 out of 51 when doing a modified Rodnan skin score (mRSS) but is often different by
6 when two investigators perform the same skin score (inter-individual) [2, 3].
Sensitivity to change requires that, when an effective drug is used, the measurement changes by a statistical or clinically important amount. This measure has been difficult when testing medications aimed at changing the underlying disease in SSc, as no proven therapies exist. On the other hand, when examining the scleroderma gastrointestinal tract questionnaire, one would require a change in this measure in response to the use of proton pump inhibitors when patients have symptoms of gastro-oesophageal reflux disease.
Finally, feasibility requires that a measure should be easy to perform, requires little time and requires a minimal amount of equipment. In this context, an MRI of the lungs may fulfil all validation measures outlined before, but may not be feasible because it is expensive, not readily available all over the world and requires sophisticated, very expensive equipment.
The efforts of OMERACT have been highly productive as exemplified by the fact that this organization has fostered the development of the American college of rheumatology (ACR) response criteria, the disease activity score, standardization of radiographic measures for RA and pSA, the AS activity score and validated measures for MRI scoring [4–6]. Ongoing efforts include development of a toxicity index for RA and multiple measures for SSc, including the gastrointestinal tract score, the use of a standardized measure for depression in SSc, completion of validation of the SHAQ, etc.
| Application of OMERACT principles to SSc |
|---|
|
|
|---|
To date, the modified Rodnan skin score, the Raynaud's condition score, the forced vital capacity as part of pulmonary function tests, right heart catheterization haemodynamics, serum creatinine, blood pressure and complete blood counts in scleroderma renal crisis and serum creatine phosphokinase as a measure of muscle disease in SSc have all been validated according to the OMERACT principles [7, 8].
In addition, the SHAQ DI has met the criteria for validation in SSc [8]. The SHAQ DI certainly has face validity. In the scleroderma lung study, the SHAQ DI differentiated lcSSc from dcSSc [lcSSC 0.48 (0.0–1.6) vs dcSSc 1.03 (0.5–2.25), P < 0.05], thus demonstrating content validity [9]. The HAQ DI correlates with disability and annual costs, thus demonstrating correlation with a gold standard (costs and disability) for rheumatic diseases, although not in SSc. [10]. Convergent validity in SSc has been demonstrated, as the SHAQ correlated with a reduced fist closure, hand spread and tender joints, all correlates of function [11]. And, in the context of content validity, it has already shown divergent validity (see earlier). Among five SSc patients within 1 month, the SHAQ DI reproducibility was demonstrated with a
of 0.8 (D. E. Furst, personal observation). The SHAQ DI is responsive to change, with a minimal discernible clinical difference of 0.10–0.14 [12]. Finally, it is eminently feasible, taking only 5–8 min to complete in a clinical setting.
The gastrointestinal tract instrument has been developed and testing of the validity of this instrument is nearly complete [13]. It has been shown to have face validity as defined by patient focus groups and has content validity as it covers from very mild to very severe disease. Construct validity was derived from correlations with patient health, while reproducibility was good (Cronbach's
0.69–0.93) and test–retest reliability was also good to excellent (
correlations 0.69–0.90). Feasibility was good as patients were comfortable doing this 52-item questionnaire. It is presently being tested for responsiveness to change.
| Ongoing efforts using OMERACT principles, as applied to pulmonary arterial hypertension in SSc |
|---|
|
|
|---|
OMERACT principles are being actively applied to the measures used when suspecting pulmonary arterial hypertension (PAH) in SSc [14, 15]. A complete description of the specifics of these efforts is beyond the purview of this article. Table 1 outlines the status of the following measures: patient history, echocardiography, MRI/magnetic resonance imaging angiogram (MRA), high-resolution CT scan of the lungs, and pulmonary function tests, all as applied to pulmonary arterial hypertension (PAH; not interstitial lung disease) in SSc.
|
As an example of the process, the validation of the echocardiogram for measurement of PAH in SSc will be reviewed. The echocardiogram has face validity, as it clearly measures cardiac function. It has content validity as it has been used in limited and diffuse disease of both short and longer duration [16]. Unfortunately, while it has been tested against a gold standard (right heart catheterization), it is not valid. It misclassified (
20%) or could not measure a surrogate for PAH (
30%) of the patients in one study [17, 18]. In another study, it was found to be highly specific (96%) but only poorly sensitive (58%), making the echocardiogram invalid with respect to criterion validity when compared with the right heart catheterization [18]. The echocardiogram has not been tested for reproducibility. Likewise, while tested for ability to measure change in idiopathic pulmonary hypertension, this test has not been shown to be responsive in SSc. Finally, while feasible, it remains expensive, thus relegating this measurement to the invalidated group in SSc [8].
| Acknowledgement |
|---|
|
|
|---|
Supplement: This paper forms part of the supplement entitled Update in systemic sclerosis. This supplement was supported by an unrestricted grant from Encysive.
Disclosure statement: D.E.F. is a consultant for and has received research support from Actelion and Gilead Pharma.
| References |
|---|
|
|
|---|
- Tugwell P, Boers M, Brooks P, Simone L, Strand V, Idzerda L. Omeract: an international initiative to improve outcome measurement in rheumatology. Trials (2007) 8:38.[CrossRef][Medline]
- Furst DE, Clements PJ, Harris R, Ross M, Levy J, Paulus HE. Measurement of clinical change in progressive systemic sclerosis: a 1 year double-blind placebo-controlled trail of N-acetylcysteine. Ann Rheum Dis (1979) 38:356–61.
[Abstract/Free Full Text] - Furst DE, Clements PJ, Steen VD, et al. The modified Rodnan skin score is an accurate reflection of skin biopsy thickness in systemic sclerosis. J Rheumatol (1998) 25:84–8.[Web of Science][Medline]
- Van Der Heijde D, Landewe R. Selection of a method for scoring radiographs for ankylosing spondylitis clinical trials, by the Assessment in Ankylosing Spondylitis Working Group and OMERACT. J Rheumatol (2005) 32:2048–9.
[Abstract/Free Full Text] - Van Der Heijde D, Landewe R, Hermann KG, et al. Is there a preferred method for scoring activity of the spine by magnetic resonance imaging in ankylosing spondylitis? J Rheumatol (2007) 34:871–3.
[Abstract/Free Full Text] - Bansback NJ, Regier DA, Ara R, et al. An overview of economic evaluations for drugs used in rheumatoid arthritis: focus on tumour necrosis factor-alpha antagonists. Drugs (2005) 65:473–96.[CrossRef][Web of Science][Medline]
- Merkel PA, Clements PJ, Reveille JD, Suarez-Almazor ME, Valentini G, Furst DE. Current status of outcome measure development of clinical trials in systemic sclerosis. Report from OMERACT 6. J Rheumatol (2003) 30:1630–47.
[Abstract/Free Full Text] - Furst D, Khanna D, Matucci-Cerinic M, et al. Systemic sclerosis – continuing progress in developing clinical measure of response. J Rheumatol (2007) 34:1194–200.
[Abstract/Free Full Text] - Tashkin DP, Elashoff R, Clements PG, et al. Cyclophosphamide versus placebo in scleroderma lung disease. N Engl J Med (2006) 354:2655–66.
[Abstract/Free Full Text] - Johnson SR, Hawker GA, Davis AM. The Health Assessment Questionnaire Disability Index and Scleroderma Health Assessment Questionnaire in scleroderma trials: an evaluation of their measurement properties. Arthritis Rheum (2005) 53:256–62.[CrossRef][Web of Science][Medline]
- Khanna D, Furst DE, Clements PJ, et al. Responsiveness of the SF-36 and the Health Assessment Questionnaire Disability Index in a systematic sclerosis clinical trial. J Rheumatol (2005) 32:832–40.
[Abstract/Free Full Text] - Khanna D, Furst DE, Hays RD, et al. Minimally important difference in diffuse systematic sclerosis: results from the D-penicillamine study. Ann Rheum Dis (2006) 65:1325–9.
[Abstract/Free Full Text] - Khanna D, Hays RD, Park GS, et al. Development of a preliminary scleroderma gastrointestinal tract 1.0 quality of life instrument. Arthritis Rheum (2007) 57:1280–6.[CrossRef][Web of Science][Medline]
- Furst DE. Measuring outcome in PAH: the gap between the measures that are used and their validity. Ann NY Acad Sci (2007) 1107:410–6.[CrossRef][Web of Science][Medline]
- Distler O, Behrens F, Huscher D, et al. Need for improved outcome measures in pulmonary arterial hypertension related to systematic sclerosis. Rheumatology (2006) 45:1455–7.
[Free Full Text] - Khanna D, Yan X, Tashkin DP, et al. Impact of oral cyclophosphamide on health-related quality of life in patients with active scleroderma lung disease: results from the scleroderma lung study. Arthritis Rheum (2007) 56:1676–84.[CrossRef][Web of Science][Medline]
- Arcasoy SM, Christie JD, Ferrari VA, et al. Echocardiographic assessment of pulmonary hypertension in patients with advanced lung disease. Am J Respir Crit Care Med (2003) 167:735–40.
[Abstract/Free Full Text] - Denton CP, Cailes JB, Philips GD, et al. Comparison of Doppler echocardiography and right heart catheterization to assess pulmonary hypertension in systemic sclerosis. Br J Rheumatol (1997) 36:239–43.
[Abstract/Free Full Text]
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||