Advertisement for orthosearch.org.uk
Results 1 - 20 of 664
Results per page:

Aims. Classifying trochlear dysplasia (TD) is useful to determine the treatment options for patients suffering from patellofemoral instability (PFI). There is no consensus on which classification system is more reliable and reproducible for the purpose of guiding clinicians’ management of PFI. There are also concerns about the validity of the Dejour Classification (DJC), which is the most widely used classification for TD, having only a fair reliability score. The Oswestry-Bristol Classification (OBC) is a recently proposed system of classification of TD, and the authors report a fair-to-good interobserver agreement and good-to-excellent intraobserver agreement in the assessment of TD. The aim of this study was to compare the reliability and reproducibility of these two classifications. Methods. In all, six assessors (four consultants and two registrars) independently evaluated 100 axial MRIs of the patellofemoral joint (PFJ) for TD and classified them according to OBC and DJC. These assessments were again repeated by all raters after four weeks. The inter- and intraobserver reliability scores were calculated using Cohen’s kappa and Cronbach’s α. Results. Both classifications showed good to excellent interobserver reliability with high α scores. The OBC classification showed a substantial intraobserver agreement (mean kappa 0.628; p < 0.005) whereas the DJC showed a moderate agreement (mean kappa 0.572; p < 0.005). There was no significant difference in the kappa values when comparing the assessments by consultants with those by registrars, in either classification system. Conclusion. This large study from a non-founding institute shows both classification systems to be reliable for classifying TD based on axial MRIs of the PFJ, with the simple-to-use OBC having a higher intraobserver reliability score than that of the DJC. Cite this article: Bone Jt Open 2023;4(7):532–538


The Bone & Joint Journal
Vol. 102-B, Issue 4 | Pages 478 - 484
1 Apr 2020
Daniels AM Wyers CE Janzing HMJ Sassen S Loeffen D Kaarsemaker S van Rietbergen B Hannemann PFW Poeze M van den Bergh JP

Aims. Besides conventional radiographs, the use of MRI, CT, and bone scintigraphy is frequent in the diagnosis of a fracture of the scaphoid. However, which techniques give the best results remain unknown. The investigation of a new imaging technique initially requires an analysis of its precision. The primary aim of this study was to investigate the interobserver agreement of high-resolution peripheral quantitative CT (HR-pQCT) in the diagnosis of a scaphoid fracture. A secondary aim was to investigate the interobserver agreement for the presence of other fractures and for the classification of scaphoid fracture. Methods. Two radiologists and two orthopaedic trauma surgeons evaluated HR-pQCT scans of 31 patients with a clinically-suspected scaphoid fracture. The observers were asked to determine the presence of a scaphoid or other fracture and to classify the scaphoid fracture based on the Herbert classification system. Fleiss kappa statistics were used to calculate the interobserver agreement for the diagnosis of a fracture. Intraclass correlation coefficients (ICCs) were used to assess the agreement for the classification of scaphoid fracture. Results. A total of nine (29%) scaphoid fractures and 12 (39%) other fractures were diagnosed in 20 patients (65%) using HR-pQCT across the four observers. The interobserver agreement was 91% for the identification of a scaphoid fracture (95% confidence interval (CI) 0.76 to 1.00) and 80% for other fractures (95% CI 0.72 to 0.87). The mean ICC for the classification of a scaphoid fracture in the seven patients diagnosed with scaphoid fracture by all four observers was 73% (95% CI 0.42 to 0.94). Conclusion. We conclude that the diagnosis of scaphoid and other fractures is reliable when using HR-pQCT in patients with a clinically-suspected fracture. Cite this article: Bone Joint J 2020;102-B(4):478–484


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_4 | Pages 3 - 3
3 Mar 2023
Roy K Joshi P Ali I Shenoy P Syed A Barlow D Malek I Joshi Y
Full Access

Classifying trochlear dysplasia (TD) is useful to determine the treatment options for patients suffering from patellofemoral instability (PFI). There is no consensus on which classification system is more reliable and reproducible for this purpose to guide clinicians in order to treat PFI. There are also concerns about validity of the Dejour classification (DJC), which is the most widely used classification for TD, having only a fair reliability score. The Oswestry-Bristol classification (OBC) is a recently proposed system of classification of TD and the authors report a fair-to-good interobserver agreement and good-to-excellent intra-observer agreement in the assessment of TD. The aim of this study was to compare the reliability and reproducibility of these two classifications. 6 assessors (4 consultants and 2 registrars) independently evaluated 100 magnetic resonance axial images of the patella-femoral joint for TD and classified them according to OBC and DJC. These assessments were again repeated by all raters after 4 weeks. The inter and intra-observer reliability scores were calculated using Cohen's kappa and Cronbach's alpha. Both classifications showed good to excellent interobserver reliability with high alpha scores. The OBC classification showed a substantial intra-observer agreement (mean kappa 0.628)[p<0.005] whereas the DJC showed a moderate agreement (mean kappa 0.572) [p<0.005]. There was no significant difference in the kappa values when comparing the assessments by consultants to those by registrars, in either classification systems. This large study from a non-founding institute shows both classification systems to be reliable for classifying TD based on magnetic resonance axial images of the patella-femoral joint, with the simple to use OBC having a higher intra-observer reliability score compared to the DJC


The Journal of Bone & Joint Surgery British Volume
Vol. 88-B, Issue 4 | Pages 484 - 488
1 Apr 2006
Rogers BA Thornton-Bott P Cannon SR Briggs TWR

We assessed the reproducibility and accuracy of four ratios used to measure patellar height, namely the Blackburne-Peel, Caton-Deschamps, Insall-Salvati and modified Insall-Salvati, before and after total knee arthroplasty. The patellar height was measured, by means of the four ratios, on the pre- and post-operative lateral radiographs of 44 patients (45 knees) who had undergone total knee arthroplasty. Two independent observers measured the films sequentially, in identical conditions, totalling 720 measurements per observer. Statistical analysis, comparing both observers and ratios, was carried out using the intraclass correlation coefficient. Before operation there was greater interobserver variation using either the Insall-Salvati or modified Insall-Salvati ratios than when using the Caton-Deschamps or Blackburne-Peel methods. This was because of difficulty in identifying the insertion of the patellar tendon. Before operation, there was a minimal difference in reliability between these methods. After operation the interobserver difference was greatly reduced using both the Caton-Deschamps and Blackburne-Peel methods, which use the prosthetic joint line, compared with the Insall-Salvati and modified Insall-Salvati, which reference from the insertion of the patellar tendon. The theoretical advantage of using the Insall-Salvati and modified Insall-Salvati ratios in measuring true patellar height after total knee arthroplasty needs to be balanced against their significant interobserver variability and inferior reliability when compared with other ratios


The Bone & Joint Journal
Vol. 100-B, Issue 2 | Pages 242 - 246
1 Feb 2018
Ghoshal A Enninghorst N Sisak K Balogh ZJ

Aims. To evaluate interobserver reliability of the Orthopaedic Trauma Association’s open fracture classification system (OTA-OFC). Patients and Methods. Patients of any age with a first presentation of an open long bone fracture were included. Standard radiographs, wound photographs, and a short clinical description were given to eight orthopaedic surgeons, who independently evaluated the injury using both the Gustilo and Anderson (GA) and OTA-OFC classifications. The responses were compared for variability using Cohen’s kappa. Results. The overall interobserver agreement was ĸ = 0.44 for the GA classification and ĸ = 0.49 for OTA-OFC, which reflects moderate agreement (0.41 to 0.60) for both classifications. The agreement in the five categories of OTA-OFC was: for skin, ĸ = 0.55 (moderate); for muscle, ĸ = 0.44 (moderate); for arterial injury, ĸ = 0.74 (substantial); for contamination, ĸ = 0.35 (fair); and for bone loss, ĸ = 0.41 (moderate). Conclusion. Although the OTA-OFC, with similar interobserver agreement to GA, offers a more detailed description of open fractures, further development may be needed to make it a reliable and robust tool. Cite this article: Bone Joint J 2018;100-B:242–6


Orthopaedic Proceedings
Vol. 91-B, Issue SUPP_II | Pages 215 - 215
1 May 2009
Qureshi AA Roberts A
Full Access

Aim: To assess the Interobserver Reliability of the Sauvegrain Skeletal Age Assessment. Methods and Results: Elbow radiographs requested to exclude injury were anonymised. Sixteen examinations were assessed by ten independent orthopaedic specialist registrars or consultants. The Sauvegrain method as modified by Dimeglio was used to score the radiographs. The observations made were then assessed for interobserver reliability by means of a multiple observer Kappa score and the total scores by intra-class correlation coefficient. Kappa scores for the components of the score were 0.403 for the lateral condyle; 0.492 for the trochlea; 0.354 for the proximal radius and 0.508 for the olecranon. Adding item scores to produce a modified Sauvegrain score had an intra-class reliability of 0.858 (95% CI 0.758 to 0.935). Conclusions: Methods of identifying skeletal maturation and predicting future growth generally depend on the use of an atlas of hand radiographs. Difficulties with poor interobserver reliability associated with these methods have led to a move towards assessments that do not depend upon bone age estimations. Unfortunately plans based on ratios of growth or average patterns produce errors when unusual types of growth disturbance are present. We conclude that use of a scoring system for maturation assessed by elbow radiographs offers a significant advantage when substituted into the straight-line method of growth prediction. The Sauvegrain method as modified by Dimeglio. 1. has demonstrated an excellent level of interobserver reliability. We have used Sauvegrain scores to improve the accuracy of timing when using the Mosely straight-line method. 3.


Orthopaedic Proceedings
Vol. 94-B, Issue SUPP_I | Pages 49 - 49
1 Jan 2012
Brunse M Stochkendahl M Vach W Kongsted A Poulsen E Hartvigsen J Christensen H
Full Access

Background and purpose. The musculoskeletal system is recognized as a possible source of pain in patients with chest pain. The objectives of the present study were (1) to investigate the interobserver reliability of an overall diagnosis of musculoskeletal chest pain using a standardized examination protocol in a cohort of patients with chest pain suspected to be of non-cardiac origin, (2) to investigate the interobserver reliability of the single components of the protocol, and finally, (3) to investigate the importance of clinical experience on the level of interobserver reliability. Methods and results. Eighty patients with acute chest pain were recruited from a cardiology department. Four observers (two chiropractors and two chiropractic students) performed a physical examination and an extended manual examination of the spine and chest wall. Percentage agreement, Cohen's Kappa and ICC were calculated for observer pairs and overall. Musculoskeletal chest pain was diagnosed in 44.0 % of patients. Interobserver kappa values were substantial for the chiropractors and overall, and moderate for the students. For single items of the protocol, both pairs showed fair to substantial agreement regarding pain provocation tests and poor to fair agreement regarding spinal segmental dysfunction tests. Conclusions. Suspected musculoskeletal chest pain can be identified with substantial interobserver reliability using this standardized protocol if used by experienced and trained observers. Agreement for individual components of the protocol showed, however, considerable variation. Provided training of observers, the examination protocol can be used in selected patients and can be implemented in pre- and post-graduate clinical training


The Journal of Bone & Joint Surgery British Volume
Vol. 80-B, Issue 4 | Pages 670 - 672
1 Jul 1998
Flinkkilä T Nikkola-Sihto A Kaarela O Päakkö E Raatikainen T

Interobserver reliability of the AO system of classification of fractures of the distal radius was assessed using plain radiographs and CT. Five observers classified 30 Colles’-type fractures using only plain radiographs; two months later they were reclassified using CT in addition. Interobserver reliability was poor in both series when detailed classification was used. By reducing the categories to five, interobserver reliability was slightly improved, but was still poor. When only two AO types were used, the reliability was moderate using plain radiographs and good to excellent with the addition of CT. The use of CT as well as plain radiographs brings interobserver reliability to a good level in assessment of the presence or absence of articular involvement, but is otherwise of minor value in improving the interobserver reliability of the AO system of classification of fractures of the distal radius


Orthopaedic Proceedings
Vol. 93-B, Issue SUPP_I | Pages 38 - 38
1 Jan 2011
Qureshi A Roberts A
Full Access

The purpose of this study was to assess the Interobserver Reliability of the Sauvegrain Skeletal Age Assessment. Elbow radiographs requested to exclude injury were anonymised. Sixteen examinations were assessed by ten independent orthopaedic specialist registrars or consultants. The Sauvegrain method as modified by Dimeglio was used to score the radiographs. The observations made were then assessed for interobserver reliability by means of a multiple observer Kappa score and the total scores by intra-class correlation coefficient. Kappa scores for the components of the score were 0.403 for the lateral condyle; 0.492 for the trochlea; 0.354 for the proximal radius and 0.508 for the olecranon. Adding item scores to produce a modified Sauvegrain score had an intraclass reliability of 0.858 (95% CI 0.758 to 0.935). Methods of identifying skeletal maturation and predicting future growth generally depend on the use of an atlas of hand radiographs. Difficulties with poor interobserver reliability associated with these methods has led to a move towards assessments that do not depend upon bone age estimations. Unfortunately plans based on ratios of growth or average patterns produce errors when unusual types of growth disturbance are present. We conclude that use of a scoring system for maturation assessed by elbow radiographs offers a significant advantage when substituted into the straight line method of growth prediction. The Sauvegrain method as modified by Dimeglio1 has demonstrated an excellent level of inter observer reliability. We have used Sauvegrain scores to improve the accuracy of timing when using the Mosely straight line method


The Journal of Bone & Joint Surgery British Volume
Vol. 72-B, Issue 2 | Pages 202 - 204
1 Mar 1990
Simmons E Graham H Szalai J

Fifteen independent observers of three levels of experience (consultant staff, fellows, residents) assessed 40 radiographs of children presenting with Perthes' disease using the Catterall and the Salter-Thompson grading systems. Each observer was supplied with descriptions and illustrations of the classifications and each hip was grouped by both systems by each observer. The results were statistically analysed using 'kappa' statistics. The level of interobserver agreement was higher for the Salter-Thompson system and correlated with the level of experience of the observer. Both systems can give acceptable levels of interobserver agreement, but the Salter-Thompson grouping is simpler and easier to apply in the earlier stages of the disease when treatment must be decided, and has a higher degree of reproducibility amongst more experienced observers


Orthopaedic Proceedings
Vol. 87-B, Issue SUPP_I | Pages 69 - 69
1 Mar 2005
Viehweger E Hélix M Jacquemier M Scavarda D Rohon MA Scorsone-Pagny S
Full Access

Introduction: With the evolution and the complexity of the treatments in cerebral palsy (CP) patients it is essential to assess their outcome using validated tools. Technical analysis offers objective data which may be associated to more subjective functional evaluation and health related quality of life tests. Simplified visual tests were proposed as an alternative to the complex and expensive instrumented three-dimensional gait analysis. The Edinburgh Visual Gait Score (EVGS) was proposed for routine clinical use when complete technical analysis is not available or may represent a part of a global patient evaluation. The purposes of our study were: 1) to apply a French translation of the EVGS to standard video recordings of a group of independent walking spastic diplegic CP patients 2) to evaluate the intraobserver and interobserver reliability and 3) to compare the results of gait analysis with experienced and inexperienced observers. Material & methods: A series of ten standard video recordings of spastic diplegic CP patients, acquired during routine clinical gait analysis were examined by eight observers, two times, with two weeks in between the assessments. Observers were selected from following specialties: three paediatric orthopaedic surgeons, one resident in orthopaedic surgery, one neurosurgeon, one physiatrist and two physiotherapists. Observers were separated into two groups according to their experience with gait analysis interpretations. Kappa statistics and intraclass correlation coefficient were calculated. Results: Better intraobserver and interobserver reliability was observed for foot and knee scores with significant difference between stance and swing phase results. Pelvis, hip and trunk score results were significantly lower. The interobserver reliability for segment scores and the global EVGS showed better results than the intraobserver reliability. The gait analysis experienced observer group showed significantly higher intraobserver and interobserver reliability. Discussion & conclusion: Our reliability results about the use of the EVGS are close to the results of Read et al. Interestingly we showed a significant difference between the two observer groups. Observers familiar with gait analysis obtained better reliability results. That shows the importance to either be used to clinical gait analysis interpretation including learning the visualisation of the different gait phases, or to benefit of a video analysis training before using the visual score as a standard clinical evaluation tool. For this study we did not use the patient preparation recommendations of the initial authors to improve accuracy of scoring because the possibility to use historic standard videos wanted to be tested. Poor score reliability of the pelvis and hip may be improved. Further studies of multilevel surgery outcome evaluation by visual analysis trained observers are needed to explore clinical changes in CP patients over time


Orthopaedic Proceedings
Vol. 88-B, Issue SUPP_I | Pages 171 - 171
1 Mar 2006
Sanchez R Salcedo C Martinez M Molina J Vera F Villarreal J
Full Access

Introduction and objectives: The purpose of the research is to show the agreement and reproducibility among 5 observers when they are questioned about 51 open fractures using two open fracture classifications for long bones (Gustilo and Aybar), interpreting the results obtained between both classifications. Material and Method: A classification protocol is established for open fractures. The fractures are graded independently using each of the systems being evaluated (Gustilo and Aybar), by visualising slides with clinical and radiologic images in addition to a report of the data in the clinical history. The survey is conducted twice with a time difference of one to eight weeks. 5 members of the Orthopedic and Traumatologic Surgery Department (OTSD) were questioned (1 Professor, 2 Specialists and 2 Residents). The statistical method used to analyse the results was the interobserver agreement percentage and the inter- and intraobserver kappa index. Results: The interobserver agreement percentage for the Gustilo classification was 58.82% and 39.21% for the Aybar classification. The kappa index for the interobserver agreement for the Gustilo classification was 0.51 and for the Aybar classification was 0.54. The kappa index for the intraobserver reproducibility was 0.69 for the Gustilo classification and 0.58 for the Aybar one. Conclusions: The interobserver agreemnet was considered moderate-poor for the Gustilo and Aybar classifications. The intraobserver reproducibility was considered substantial for the Gustilo classification and moderate for the Aybar one. We conclude that this agreement shows too much variability as to accept just one classification as the only valid method to take therapeutic decisions or for comparing results. Therefore, it’s necessary to create a more detailed and careful classification, which is quick to use, reliable, reproducible and which contains a more objective criteria


Orthopaedic Proceedings
Vol. 88-B, Issue SUPP_II | Pages 314 - 314
1 May 2006
Elkinson I Crawford H Barnes M Boxch P Ferguson J
Full Access

The aim was to evaluate the Intraobserver and Interobserver reliability of Pelvic Incidence as a fundamental parameter of sagittal spino-pelvic balance in patients with spondylolisthesis compared to controls with Idiopathic Adolescent Scoliosis. A blinded test retest study including multi-surgeon assessment of Pelvic Incidence in patients with spondylolisthesis and Idiopathic Adolescent Scoliosis was carried out. We assessed the agreement between the pelvic incidence measurements using the Bland and Altman method and mean differences (95% confidence interval) are reported. Forty patients seen at Starship Children’s Hospital between 1992 – 2003 by two spinal surgeons were retrospectively identified. The main group had 20 patients with spondylolisthesis (Isthmic and/or Dysplastic types) and the control group consisted of 20 patients with Idiopathic Adolescent Scoliosis. Five observers with different levels of experience included the two orthopaedic surgeons, one fellow, one senior trainee and one non-trainee registrar. Prior to the initial test phase, a consensus-building session was carried out. All five observers arrived at a standardised method for measuring the Pelvic Incidence. In the test phase randomly ordered lateral lumbosacral radiographs were independently evaluated by the five observers and pelvic incidence was measured. Assessment of the Pelvic Incidence was repeated one week later in the re-test phase. The radiographs were presented in a randomly pre-assigned order. Bland and Altman plots were constructed and mean differences (95% confidence interval) reported to evaluate the agreement between the Pelvic Incidence measurements among the five independent observers. All analysis was performed on the statistical software package SAS. P-value of 0.05 was considered statistically significant. The spondylolisthesis group had 11 (55%) males and 9 (45%) females with an average age of 14 ± 4.2. 2 patients had high-grade (Meyerding Class III, IV, V) and 16 had low-grade (Meyerding Class I, II) spondylolisthesis. 2 patients were post-reduction of spondylolisthesis. In the Scoliosis group there were 2 (10%) males and 18 (90%) females with an average age of 15 ± 2.9. There was no significant difference between male and females pelvic incidence measurement (60° ± 18.7° vs. 57° ± 14.6°, p=0.540) or age (15 ± 2.9 vs. 14 ± 3.8, p=0.181). There was no difference in pelvic incidence across the Meyerding groups, p=0.257. There was a significant difference between spondylolisthesis and scoliosis pelvic incidence measurements 65° ± 15.6° vs. 51° ± 12.8°, p=0.003. In the . Spondylolisthesis Group. the interobserver reliability between five clinicians, expressed as the mean difference in pelvic incidence measurement was 0.6° (95%CI −0.81, 1.91) and was not significantly different from zero p=0.423. The agreement limits were from −12.8° to 13.9°. The intraobserver reliability of pelvic incidence showed the mean difference ranging from −2.1° to 1.4° (p=0.129 and 0.333 with 95% CI). One had marginal evidence of a significant difference of 3.3° (95% CI 0.05° to 6.55°, p=0.047). In the . Scoliosis Group. the interobserver reliability was 0.3° (95% CI −0.81, 1.49) and was not significantly different from zero p=0.726. The agreement limits were from −11.0° to 11.6°. The intraobserver reliability among four observers ranged from −1.7° to 0.5° (p=0.178 and 0.661). One had a significant difference in readings of 4.1° (95% CI of 0.70° to 7.40°, p= 0.020). Scoliosis patients had a significantly smaller pelvic incidence than spondylolisthesis patients. The interobserver reliability of the pelvic incidence measurement was excellent across both groups. The intraobserver reliability was good with only one observer in each group demonstrating a marginally significant difference. Pelvic incidence is therefore a reliable measurement which can be used as a predictor in progression of spondylolisthesis


Orthopaedic Proceedings
Vol. 84-B, Issue SUPP_I | Pages 46 - 46
1 Mar 2002
Lautman S Faizon G Roger R Rosset P
Full Access

Purpose: Classifications of fractures of the thoracolumbar spine are theoretically designed to help make therapeutic decisions. Three classifications (J. Laulan, F. Denis, F. Magerl) were compared to assess reproducibility for use by a surgery team. Material and methods: The classifications were described during a SOFCOT symposium in 1995. Four observers examined 60 files reading them twice at a 1 month interval. The files included plain radiographs (AP and lateral view) and a scanner series and were read in random order. Intra- and interobserver concordance were measured with the kappa method. Results: Intra- and interobserver reproducibility was good for the classification proposed by F. Denis (kappa = 0.6229 and 0.0795) for classification groups but was weak for subgroups (kappa = 0.028 and 0.571). Reproducibility was moderate for the classification proposed by J. Laulin (interob-server kappa = 0.460, intraobserver kappa = 0.541). The Magerl classification produced low to negligible reproducibility for classification groups and subgroups (intra- and interobserver kappa = 0.138 to 0.0343). Discussion: Because of its low to negligible reproducibility, the Magerl classification would be difficult to use in clinical practice to make coherent therapeutic decisions or for scientific research to analyze series of fractures treated using this classification. The reproducibility of the F. Denis classification was good for groups but low for subgroups that include fractures resulting from different mechanisms requiring radically different treatment strategies. This is a good classification system for descriptive work but can lead to treatments poorly adapted to the causal mechanism of the fracture. The reproducibility of the J. Laulan classification is moderate but each group in this classification corresponds to fractures caused by the same mechanism. Therapeutic indications determined with this system would be more coherent


Orthopaedic Proceedings
Vol. 96-B, Issue SUPP_11 | Pages 315 - 315
1 Jul 2014
Dhooge Y Wentink N Theelen L van Hemert W Senden R
Full Access

Summary. The ankle X-ray has moderate diagnostic power to identify syndesmotic instability, showing large sensitivity ranges between observers. Classification systems and radiographic measurements showed moderate to high interobserver agreement, with extended classifications performing worse. Introduction. There is no consensus regarding the diagnosis and treatment of ankle fractures with respect to syndesmotic injury. The diagnosis of syndesmotic injury is currently based on intraoperative findings. Surgical indication is mainly made by ankle X-ray assessment, by several classification systems and radiographic measurements. Misdiagnosis of the injury results in suboptimal treatment, which may lead to chronic complaints, like instability and osteoarthritis. This study investigates the diagnostic power and interobserver agreement of three classification methods and radiographic measures, currently used to assess X-ankles and to identify syndesmotic injury. Patients and Methods. Twenty patients (43.2 ± 15.3yrs) with an ankle fracture, indicated for surgery, were prospectively included. All patients received a preoperative ankle X-ray, which was assessed by several observers: two orthopaedic surgeons, one trauma surgeon and two radiologists. The ankle X-ray was assessed on syndesmotic injury/stability and presence of fractures (fibula, medial/tertius malleolus). Three classification systems were used: Weber, AO-Müller (short-version n=3 options; extended-version n=27 options), Lauge-Hansen (short-version n=5 options; extended-version n=17 options) and two radiographic measurements were done: tibiofibular overlap (TFO) and ratio medial clearspace/superior clear space (MCS/SCS). All observers were instructed about the assessments before the measurements. During surgery, a proper intraoperative description of the syndesmosis was noted. Agreement (%), Intraclass Correlation Coefficients (ICC) and Kappa were calculated to determine interobserver agreement. Kappa statistic was interpreted according to Landis and Koch. To test the diagnostic power of ankle X-rays to identify syndesmotic instability, sensitivity and specificity were calculated with intraoperative findings serving as golden standard. Results. Six of 20 ankles showed syndesmotic instability intraoperatively. An overall sensitivity of 43% (specificity: 78) was found for X-rays in identifying syndesmotic instability, showing a wide range in sensitivity between observers (17–83%), with radiologists performing better (range 50–83%) than surgeons (range: 17–33%). Overall, substantial to perfect interobserver agreement (range 70–100%) was found for all short classification systems, showing an average kappa ≥0.60. The agreement reduced for more extended classification systems. E.g. observer agreement for the AO-Muller classification with 3, 9 and 27 options was respectively 85% (kappa 0.66), 68% (kappa 0.57) and 55% (kappa 0.51). One observer deviated slightly from others in all classification assessments. Removing this observer resulted in excellent agreement for all classification systems (>90%). Radiographic measurements showed moderate to high interobserver agreement, with TFO performing best (avg. ICC 0.88). Discussion/Conclusion. In ankle fractures, a preoperative X-ray has low sensitivity in detecting syndesmotic instability, showing large sensitivity ranges between observers. Further study is needed to investigate the contribution of classification systems in determining the best treatment method for syndesmotic injury. Ankle X-ray assessment using the three classification systems and radiographic measures was consistent among observers. Disagreement between observers can be attributed to intrinsic differences among the systems (e.g. stepwise classification vs. single assessment). No preference for one specific classification was found, as all showed comparable interobserver agreement. However classification systems with few options are recommended, as the observer agreement reduced with more extending classifications


The Journal of Bone & Joint Surgery British Volume
Vol. 84-B, Issue 7 | Pages 950 - 954
1 Sep 2002
Brorson S Bagger J Sylvest A Høbjartsson A

We investigated whether training doctors to classify proximal fractures of the humerus according to the Neer system could improve interobserver agreement. Fourteen doctors were randomised to two training sessions, or to no training, and asked to categorise 42 unselected pairs of plain radiographs of fractures of the proximal humerus according to the Neer system. The mean kappa difference between the training and control groups was 0.30 (95% CI 0.10 to 0.50, p = 0.006). In the training group the mean kappa value for interobserver variation improved from 0.27 (95% CI 0.24 to 0.31) to 0.62 (95% CI 0.57 to 0.67). The improvement was particularly notable for specialists in whom kappa increased from 0.30 (95% CI 0.23 to 0.37) to 0.79 (95% CI 0.70 to 0.88). These results suggest that formal training in the Neer system is a prerequisite for its use in clinical practice and research


Orthopaedic Proceedings
Vol. 88-B, Issue SUPP_I | Pages 187 - 187
1 Mar 2006
Maguire M Mohil R Ng A Hodgson S
Full Access

The AO, Frykman, Mayo and Fernandez classification system for distal radius fractures were evaluated for interobserver reliability and intraobserver reproducibility using plain radiographs. Five orthopaedic consultants, five orthopaedic registras and five orthopaedic senior house officers classified 20 sets of distal radius fractures on two seperate occasions. There were 2400 induvidual observations. Kappa statistics were used to establish a relative level of agreement between observers for the two readings and between seperate readings by the same observer. Our results for intraobserver reproducibility showed Fernandez Kappa value of 0.49, Frykman 0.47, Mayo 0.45 and AO 0.33. A 0.4 result shows good consistecy accorcing to well reconised staistical boundries and is significant. That is reproducibility happened at a level greater than by chance. Interobserver Kappa values were poor in all classification systems. We also sought to look at varibles within grade of surgeon and developed Kappa values for these also


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_11 | Pages 12 - 12
4 Jun 2024
Chapman J Choudhary Z Gupta S Airey G Mason L
Full Access

Introduction

Treatment pathways of 5th metatarsal fractures are commonly directed based on fracture classification, with Jones types for example, requiring closer observation and possibly more aggressive management.

Primary objective

To investigate the reliability of assessment of subtypes of 5th metatarsal fractures by different observers.


Orthopaedic Proceedings
Vol. 92-B, Issue SUPP_I | Pages 27 - 27
1 Mar 2010
Cunningham MR Quirno M Bendo J Steiber J
Full Access

Purpose: Facet joint arthrosis is an entity that can have a key role in the etiology of low back pain, especially with hyperextension, and is a key component of surgical planning, especially when considering disc arthroplasty. Plain films and MRI are most commonly utilized as the initial imaging of choice for low back pain, but these methods may not truly allow an accurate assessment of facet arthosis. Our purpose was to observe the inter- and intraobserver reliability of utilizing CT and MRI to evaluate facet arthrosis, the inter- and intraobserver reliability of the facet grading system, and the agreement of surgeons as to when to perform disc arthroplasty after the lumbar facets are evaluated. Method: A power analysis was performed which showed we would need 6 reviewers and 43 images to have 80% power to show excellent reliability. 102 CT and the corresponding MRI images of lumbar facets were obtained from patients who were to undergo lumbar spine surgery of any type. 10 spine surgeons and 3 spine fellows reviewed the randomized images at 2 time points, 3 months apart, graded the facet arthosis as well as indicated whether they would chose to perform a disc arthroplasty based on the amount of facet arthrosis. Both interobserver and intraobserver kappa values were calculated by result comparison between observers at the two time points and between CT and MRI images from the same patient. Results: interobserver reliability for MRI was 0.21 and 0.07(fair to slight agreement), and for CT was 0.33 and 0.27(fair agreement), for the spine surgeons and spine fellows respectively. The mean intraobserver reliability for MRI was 0.36 and 0.26 (fair agreement) and for CT was 0.52 and 0.51 (moderate agreement). The kappa value for agreement of whether to perform a disc arthroplasty after grading the facet arthrosis utilizing MRI was 0.22 (fair agreement) and utilizing CT was 0.33 (fair agreement) among the senior spine surgeons. Conclusion: The existing grading system for facet arthrosis and of whether to perform a disc arthroplasty utilizing the grading system has at best only fair agreement. CT is more reliable for grading facet arthrosis


The Journal of Bone & Joint Surgery British Volume
Vol. 84-B, Issue 1 | Pages 48 - 49
1 Jan 2002
Javed A Siddique M Vaghela M Hui ACW

We carried out a prospective study in order to establish to what extent the intra-articular evaluation undertaken during arthroscopy of the knee differed between surgeons. Two senior specialist registrars and a consultant orthopaedic surgeon with a special interest in knee surgery were involved. A total of 78 knee arthroscopies (78 patients) was studied. Arthroscopy was first carried out by the trainee and then by the senior author (ACWH). The intra-articular evaluation during the arthroscopy was recorded independently by a third person in the operating theatre. Data were collected to record variations in examination under anaesthesia, the morphology and pathology of the menisci and anterior cruciate ligament and the state of the articular surfaces. The overall interobserver variation was 20% in all categories. We question the published results of intra-articular evaluation during knee arthroscopy when surgeons of different levels of experience are involved in a single study