Rheumatology Advance Access originally published online on December 6, 2005
Rheumatology 2006 45(5):595-599; doi:10.1093/rheumatology/kei243
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Reproducibility of instrumented knee joint laxity measurement in healthy subjects
1 Jan van Breemen Institute, Center for Rehabilitation and Rheumatology, 2 VU University Medical Center, Department of Rehabilitation Medicine, 3 Institute for Research in Extramural Medicine (EMGO Institute), VU University Medical Center, 4 Department of Allied Health Care Research, Amsterdam School of Allied Health Education and 5 MOVE Research Institute for Clinical Movement Studies, Amsterdam, The Netherlands.
Correspondence to: M. van der Esch, Jan van Breemen Institute, Center for Rehabilitation and Rheumatology, Jan van Breemenstraat 2, 1056 AB Amsterdam, The Netherlands. E-mail: M.van.der.Esch{at}planet.nl
| Abstract |
|---|
|
|
|---|
Objective. To determine the reproducibility of frontal plane knee joint laxity measurement through the assessment of intra- and inter-rater reliability coefficients and intra- and inter-rater agreement coefficients.
Methods. Two raters independently assessed the laxity of the knee joint in the frontal plane by three repeated measurements. Fourteen days later the assessment was repeated. Complete data were obtained from 20 healthy subjects. Laxity was assessed using a device which consisted of a chair with a free-moving arm that supported the subject's lower leg. Medial and lateral loads were applied, resulting in a varus and valgus movement in the knee joint. The intra- and inter-rater reliability coefficients [intraclass correlation coefficients (ICC)] were estimated, as were the intra- and inter-rater agreement parameters [standard error of measurement (SEM) and minimal detectable difference (MDD)].
Results. Adequate intra-rater reliability (ICC>0.80) was calculated for each rater's measurements of laxity. The inter-rater reliability was less adequate (ICC=0.65) when calculated using the first day's measurements. However, inter-rater reliability was adequate (ICC=0.88) when calculated using the day 14 measurements. The intra-rater measurement error calculated across occasions was 1.3° for individual subjects. This resulted in an MDD of 3.7°. The inter-rater measurement error, i.e. the SEM and MDD, was higher (1.5° and 4.3°, respectively).
Conclusions. Intra-rater reliability of knee joint laxity measurement is good. Adequate training of raters establishes the basis for good inter-rater reliability. In clinical trials, it is preferable for one trained rater to perform the laxity measurement. The measurement of knee joint laxity is limited due to its relatively high measurement error in individual subjects; therefore, this measurement should be restricted to group assessment rather than individual patient assessment.
KEY WORDS: Reproducibility, Reliability, Knee joint laxity, Osteoarthritis
| Introduction |
|---|
|
|
|---|
Frontal plane knee joint laxity may play an important role in knee osteoarthritis (OA). Laxity can be defined as the angular deviation of the tibia-femoral joint in the frontal plane after varusvalgus load is applied [1, 2]. Laxity is related to radiographic progression and to poor functional outcome [38]. Although laxity has been identified as an important factor in OA of the knee, detailed information on the clinimetric properties of its measurement is unavailable.
Measuring laxity equates to measuring small differences in varusvalgus deviations. To detect minimal differences in laxity, high-precision measurement with high reproducibility is essential.
Reproducibility concerns the degree to which repeated measurements in a constant situation provide similar answers. For the quantification of reproducibility, two types of measure can be distinguished: measurements of reliability and measurements of agreement. Reliability parameters assess whether persons in a group can be distinguished from each other, despite measurement errors [9]. Reliability is expressed as the intraclass correlation coefficient (ICC), ranging from 0 to 1 [9]. A high ICC represents a sufficient distinguishing capacity of the instrument regardless of measurement error. In order to identify precise measurement, the absolute measurement error has to be taken into account. Expressing the measurement error in scale points is often referred to as agreement. Agreement parameters assess how close the results of the measurements are within individual subjects by estimating the absolute measurement error in repeated measurements [10, 11]. Agreement in measuring joint laxity is expressed as the standard error of measurement (SEM) in degrees and the minimal detectable difference (MDD) in degrees.
Currently, there is very limited information regarding the reproducibility of the measurement of knee joint laxity. In two studies [4, 5], the reliability was tested on four and five patients, respectively, with an intra-rater reliability of 0.92 (ICC). Sharma et al. [1, 2, 6, 7] presented reliability scores ranging from 0.84 to 0.90 (ICC). Information regarding inter-rater reliability and agreement parameters is presently unavailable. For this reason there is an evident need to examine the reproducibility of the measurement of laxity in the knee.
The objective of this study was to establish (i) the intra- and inter-rater reliabilities and (ii) the intra- and inter-rater agreement parameters of the measurement of knee joint laxity.
| Methods |
|---|
|
|
|---|
Subjects
Twenty healthy young volunteers (10 males, 10 females) participated in the study. The mean ± S.D. age of the subjects was 22.9 ± 3.0 yr. The inclusion criteria were no current knee pain; no previous injury in the hip-knee region; no analgesics or anti-depressive medication; and, for women, regular menstrual cycles for the 3 months prior to the study. All of the above criteria may influence the degree of laxity. Ethical review board approval was obtained, and all participants provided written informed consent.
Design
Two raters (a physical therapist and a human movement scientist), both trained in clinical measurements by a clinician, independently performed all the laxity measurements. The subjects were scheduled for the two experimental sessions (days 1 and 14). On both occasions the raters measured the subjects. Each rater measured the same knee of each subject three times. In 10 subjects the right knee was measured and in 10 other subjects the left knee.
Each rater made three consecutive measurements and the subjects remained seated and fixed between measurements. The loading was applied and then released after each measurement. The deviation in the subject's knee was recorded digitally. After the first rater had assessed the joint laxity, all fixation points were removed and the subject stood up. Subsequently the second rater seated, fixed and assessed the same subject. To avoid bias, the second rater waited in another room while the first rater performed the measurements.
After 14 days the procedure was repeated; the order of raters was reversed. Both raters were blinded to the results of the reproducibility analyses of the day 1 measurements.
Equipment
An electronic device (Fig. 1) was used to measure knee varusvalgus laxity. A chair with an attached free-moving arm, which supported the subject's lower leg, was used to seat the subject. The subject was seated comfortably in the measurement chair, which had a back support. The device was constructed in such a manner that throughout the study the knee joint was held in 20° of flexion.
|
The thigh, lower leg and ankle were fixed to the device. No medial or lateral movement of the lower leg and thigh or internal and external rotation of the hip was possible using these fixation techniques. The thigh and lower leg were fixed at five places. The foot and distal part of the lower leg were fastened to the arm using clamps at the ankle and at the distal part of the leg (Fig. 1; points 1 and 2). Below the knee the lower leg was fixed to the device with a Velcro bandage (Fig. 1; point 3). The distal/lower part of the thigh was fixed using two clamps (Fig. 1; point 4). The upper thigh was fastened to the chair using a Velcro bandage (Fig. 1; point 5).
The joint of the arm moved with minimal friction. The axis of rotation of the free-moving arm was centrally located directly under the tibiofemoral joint of the subject (i.e. the middle of the popliteal fossa). To supply a steady moment to the knee of 7.7 Nm, a dead-weight was used. This weight was attached to the free-moving arm by a cord. The cord was attached 0.68 m from the axis of rotation of the arm. The load could be applied to the lower leg both medially and laterally, resulting in varus or valgus movement in the knee joint. An electronic measurement system digitally recorded the end point of the varus or valgus movement, after 4 s. Laxity of the knee joint was calculated as the sum of the varus and valgus deviations in degrees [7, 8].
Joint laxity measurement
All measurements of laxity were performed in accordance with our protocol, including the use of anatomical landmarks for patient positioning, patient instructions and the examiner's position.
Anatomical landmarks of the knee were palpated to localize the medial and lateral joint spaces and the middle of the popliteal fossa. These anatomical structures give an indication of the position of the varusvalgus rotation axis of the tibia-femoral joint of the knee. The electronic meter was positioned in line with the varusvalgus rotation axis (Fig. 1; point B).
To avoid increased muscle tone resulting from pain during the fixation or measurement, subjects were instructed to relax as much as possible and to report the onset of pain.
Raters were seated behind the patient and applied the load slowly by hand to the lower leg in a standardized manner.
Analysis
The mean score in degrees for laxity obtained from the three measurements was used for analysis. Reproducibility was assessed using the following sources of variance: subject, rater, time of measurement and interaction between these variables. To express reproducibility, the following parameters were established [11, 12].
Intra-rater reliability
The ICC (2,k) was calculated as the ratio of variance between subjects within one rater, in relation to the relative measurement error (including all sources of variance: rater, subject, time of measurements and the absolute measurement error).
Intra-rater agreement
The SEM concerns the absolute measurement error in measuring an individual. It assesses the proximity of the scores on repeated measures [10, 11, 13]. The amount of measurement error can be expressed as the SEM. The SEM was derived by taking the square root of the error variance of the following sources of variance: time of measurement, interaction between subject and time of measurement, interaction between rater and time of measurement, and interaction between subject, rater and time of measurement. The SEM was calculated across both occasions. The SEM was used to calculate the MDD. The MDD is the smallest measurable difference that can be interpreted as a real difference between two measurements, i.e. beyond zero [10, 11]. To compute the MDD as the 95% confidence limit of the SEM, the SEM has to be multiplied by 1.96 [for the 95% confidence interval (CI)] and by the square root of 2 for the difference scores (1.96 x
2 x SEM). The MDD expresses the uncertainty of the difference between two observed scores [14].
Inter-rater reliability
The ICC was calculated as the ratio of variance (rater, subject, time of measurement and the absolute measurement error) between subjects and between the two raters, in relation to the relative measurement error.
Inter-rater agreement
The SEM was calculated to establish the absolute measurement error across raters and occasions, calculated according to the generalizability theory [9]. The SEM was derived by taking the square root of the error variance of the following sources of variance: rater, time of measurement, interaction between rater and subject, interaction between rater and time of measurement, interaction between subject and time of measurements, and the interaction between subject, rater and time of measurement. The SEM was used to calculate the MDD. The MDD was also calculated across raters and occasions.
In order to visualize the difference between raters against the corresponding mean of the two raters for each subject, a limits-of-agreement plot was constructed, as proposed by Bland and Altman [15].
For reliability, an ICC of >0.70 was regarded as adequate [16]. Confidence intervals were presented as an indication of the precision of the point estimate. To calculate the ICC, the SEM and the MDD, a two-way random effects model of analysis of variance (ANOVA) was performed, using the Statistical Package for the Social Sciences (SPSS) version 12.0. Windows (SPSS, Chicago, IL, USA).
| Results |
|---|
|
|
|---|
Subjects
The study sample consisted of 20 healthy subjects. The demographic data of the subjects are presented in Table 1. For rater A, the mean scores in knee joint laxity on the first (day 1) and second assessment (day 14) were 5.5 and 6.2°, respectively. For rater B, the mean scores were 5.5 and 6.5° at day 1 and 14, respectively.
|
Intra-rater reliability
The ICC for rater A was 0.84 (95% CI 0.61, 0.94) and 0.93 (95% CI 0.81, 0.97) for rater B.
Intra-rater agreement
Generalized across occasions for the same fixed rater, the measurement error, expressed as the SEM, was 1.35° and the MDD was 3.73° (Table 2).
|
Inter-rater reliability
The ICC was 0.65 (95% CI 0.13, 0.86) for the assessment on day 1 and 0.88 (95% CI 0.70, 0.95) for the assessment on day 14.
Inter-rater agreement
The SEM was 1.55° and the MDD 4.30°, generalized across raters and occasions. This result represents the absolute measurement error when a subject has been measured on a first occasion by a rater and the same subject is also measured on a second occasion by a second rater. The agreement coefficients are presented in Table 2.
Figure 2 shows the difference between raters on day 14, plotted against the mean value of both raters for each subject for laxity of the knee joint. No systematic variation in the differences over the range of measurement was found amongst the subjects. The width of the limits of agreement suggests that there was considerable random variation.
|
| Discussion |
|---|
|
|
|---|
In this study the reproducibility of knee joint laxity measurement was quantified using generalized reliability parameters and agreement parameters in healthy, stable subjects. The ICC as an intra-rater reliability coefficient expresses the measured variance within one rater on two occasions. In our study the ICCs were found to be adequate for both raters (0.84 and 0.93, respectively). The ICC as an inter-rater reliability coefficient expresses the measured variance between two raters on the first and second occasions. The ICC was low (0.65) on the first occasion (day 1) and adequate (0.88) on the second occasion (day 14). Measurement of intra-rater agreement parameters is important in quantifying measurement error. In our study the intra-rater SEM was 1.3°. When the measurement was repeated by the same rater on the same subject the MDD was 3.7°. This expresses (with an uncertainty of <5%) that a difference between two measurements of less than 3.7° is attributable to measurement error and can therefore not be interpreted as a real difference. Only a difference in measurements made by the same rater exceeding 3.7° is likely to signify a real change in laxity.
Inter-rater agreement parameters express the absolute measurement error when a rater measures an individual subject on one occasion and a second rater measures the same subject on a second occasion 14 days later. In our study the inter-rater SEM was 1.5°. This indicates the absolute measurement error generalized for occasions and raters. The MDD was 4.3°, which indicates that this is the smallest difference between two measurements made by different raters at different times that can be interpreted as a genuine change.
To assess reproducibility we used healthy, stable subjects. It was assumed that the biological variation in the group, i.e. the variability in laxity of the knee joint, was small. The raters were well instructed, trained and measured in accordance with a given protocol. Compared with other studies involving clinical subjects [1, 2, 4, 5], our intra-rater reliability coefficients were lower. The heterogeneity of the population in previously conducted clinical studies could explain the difference from our study. A small range of laxity in healthy subjects makes the distinction between subjects more difficult, compared with a patient population with higher variability. In a patient population the subjects are easier to rank, because the difference between subjects is greater than the difference between subjects in a healthy population. Consequently, the ICC will be lower in healthy subjects than the ICCs in clinical studies. To compare the inter-rater reliability, no other studies are available. The inter-rater reliability coefficient was substantially higher on day 14 compared with day 1. In the day 14 session reliability was good. Although the raters had some previous experience in knee assessment, it is conceivable that experience gained through the knee measurements in this study resulted in a higher reliability coefficient. The increased experience could explain the higher reliability on day 14.
Inter-rater reliability was lower than intra-rater reliability. Therefore, using one trained rater to perform all laxity measurements is recommended.
One source of error could be the fixation of the lower leg and thigh of subjects. The lower leg and thigh were fixed in five places. Possible reasons for variation in the fixation points which can account for measurement variance are: (i) small differences in the positioning of the leg during fixation between raters and between occasions, and (ii) possible pain in the lower leg and thigh during the measurement. In our study a load of 7.7 Nm was used. Sharma et al. [1, 2] used a load of 12 Nm. In a study applying this load to patients with OA of the knee, we found that a load of 12 Nm induced pain in some patients [8]. Hence, we decided to reduce the applied load to 7.7 Nm. This reduced load was not painful for any of the patients tested. However, it is recommended that the patient's exposure should be limited to the minimum number of measurement readings needed to obtain a reliable result, and that attention should be paid to possible discomfort or pain during the measurements, in order to prevent any adverse effects. Although the mean scores of laxity in our study are similar to those in the Sharma et al. study [1, 2, 7], the results are not comparable because of the technical differences between the devices and the different loads used in the measurement.
Our results suggest that laxity measurements are of limited use in clinical practice, because of considerable measurement error. However, in research precision can be increased by including more subjects. For clinical trials related to laxity, an adequate number of subjects should be included, based on a power analysis.
In conclusion, these results on the reproducibility of knee joint laxity measurement indicate that the intra-rater reliability is good. The inter-rater reliability is less adequate on the first test occasion and good on the second test occasion. In a setting in which both raters are well trained, it is possible to achieve acceptable inter-rater reliability. The interpretation of results of the measurement of frontal plane knee laxity at the individual level is limited because of measurement error.
| Acknowledgments |
|---|
We gratefully acknowledge Ms L. Pernet and Ms M. C. Boucher for their assistance in obtaining the data, Ms K. Fiedler for her assistance in correcting the manuscript, Mr D. Knol for his assistance with the statistical analysis and Mr A. v. Vark of Enraf-Nonius, Delft, The Netherlands, for manufacturing the knee laxity measuring device.
The authors have declared no conflicts of interest.
| References |
|---|
|
|
|---|
- Sharma L, Congron L, Felson DT, Dunlop DD, Kirwan-Mellis G, Hayes KW, Weinrach D, Buchanan T. Laxity in healthy and osteoarthritic knees. Arthritis Rheum 1999;42:86170.[CrossRef][Web of Science][Medline]
- Sharma L. Local mechanical factors in the natural history of knee osteoarthritis. Malalignment and joint laxity. In: Brandt KD, Doherty M, Lohmander LS, eds. Osteoarthritis, 2nd ed. Oxford: Oxford University Press, 2003:17783.
- Pottenger LA, Phillips FM, Draganich LF. The effect of marginal osteophytes on reduction of varusvalgus instability in osteoarthritic knees. Arthritis Rheum 1990;33:8538.[Web of Science][Medline]
- Wada M, Imura S, Baba H, Shimada S. Knee laxity in patients with osteoarthritis and rheumatoid arthritis. Br J Rheumatol 1996;35:5603.
[Abstract/Free Full Text] - Brage ME, Draganich LF, Pottenger LA, Curran JJ. Knee laxity in symptomatic osteoarthritis. Clin Orthop 1994;304:1849.
- Sharma L, Cahue S, Song J, Hayes K, Pai Y, Dunlop D. Physical functioning over three years in knee osteoarthritis. Arthritis Rheum 2004;48:335970.
- Sharma L, Hayes KW, Felson DT et al. Does laxity alter the relationship between strength and physical function in knee osteoarthritis? Arthritis Rheum 1999;42:2532.[CrossRef][Web of Science][Medline]
- Esch van der M, Steultjens M, Wieringa H, Dinant H, Dekker J. Structural joint changes, malalignment and laxity in osteoarthritis of the knee. Scand J Rheumatol 2005;34:298301.[CrossRef][Medline]
- Streiner DL, Norman GR. Health measurement scales, 3rd edn. Oxford: Oxford University Press, 2003.
- Vet HCW de, Bouter LM, Bezemer PD. Reproducibility and responsiveness of evaluative outcome measures. Int J Technol Assess Health Care 2001;17:47987.[Web of Science][Medline]
- De Vet HCW. Observer reliability and agreement. In: Armitage P, Colton T, eds. Encyclopedia of biostatistics. Boston: John Wiley & Sons, 1998:31238.
- Fleiss JL. Reliability of measurement. In: the design and analysis of clinical experiments. John Wiley & Sons, New York; 1986:133.
- Roebroeck ME, Harlaar J, Lankhorst GJ. The application of generalizability theory to reliability assessment: an illustration using isometric force measurements. Phys Ther 1993;73:38695; discussion 396401.
[Abstract/Free Full Text] - Beckerman H, Roebroeck ME, Lankhorst GJ, Becher JG, Bezemer PD, Verbeek ALM. Smallest real difference, a link between reproducibility and responsiveness. Qual Life Res 2001;10:5718.[CrossRef][Web of Science][Medline]
- Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1:30710.[CrossRef][Web of Science][Medline]
- Bot SD, Terwee CB, van der Windt DA, Bouter LM, Dekker J, de Vet HC. Clinimetric evaluation of shoulder disability questionnaires: a systematic review of the literature. Ann Rheum Dis 2004;63:33541.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
H.-M. Chen, C. C. Chen, I-P. Hsueh, S.-L. Huang, and C.-L. Hsieh Test-Retest Reproducibility and Smallest Real Difference of 5 Hand Function Tests in Patients With Stroke Neurorehabil Neural Repair, June 1, 2009; 23(5): 435 - 440. [Abstract] [PDF] |
||||
![]() |
Y.-W. Hsieh, C.-H. Wang, S.-C. Wu, P.-C. Chen, C.-F. Sheu, and C.-L. Hsieh Establishing the Minimal Clinically Important Difference of the Barthel Index in Stroke Patients Neurorehabil Neural Repair, May 1, 2007; 21(3): 233 - 238. [Abstract] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


