Advertisement for orthosearch.org.uk
Results 1 - 50 of 1170
Results per page:

Aims. Classifying trochlear dysplasia (TD) is useful to determine the treatment options for patients suffering from patellofemoral instability (PFI). There is no consensus on which classification system is more reliable and reproducible for the purpose of guiding clinicians’ management of PFI. There are also concerns about the validity of the Dejour Classification (DJC), which is the most widely used classification for TD, having only a fair reliability score. The Oswestry-Bristol Classification (OBC) is a recently proposed system of classification of TD, and the authors report a fair-to-good interobserver agreement and good-to-excellent intraobserver agreement in the assessment of TD. The aim of this study was to compare the reliability and reproducibility of these two classifications. Methods. In all, six assessors (four consultants and two registrars) independently evaluated 100 axial MRIs of the patellofemoral joint (PFJ) for TD and classified them according to OBC and DJC. These assessments were again repeated by all raters after four weeks. The inter- and intraobserver reliability scores were calculated using Cohen’s kappa and Cronbach’s α. Results. Both classifications showed good to excellent interobserver reliability with high α scores. The OBC classification showed a substantial intraobserver agreement (mean kappa 0.628; p < 0.005) whereas the DJC showed a moderate agreement (mean kappa 0.572; p < 0.005). There was no significant difference in the kappa values when comparing the assessments by consultants with those by registrars, in either classification system. Conclusion. This large study from a non-founding institute shows both classification systems to be reliable for classifying TD based on axial MRIs of the PFJ, with the simple-to-use OBC having a higher intraobserver reliability score than that of the DJC. Cite this article: Bone Jt Open 2023;4(7):532–538


Bone & Joint Research
Vol. 12, Issue 5 | Pages 313 - 320
8 May 2023
Saiki Y Kabata T Ojima T Kajino Y Kubo N Tsuchiya H

Aims. We aimed to assess the reliability and validity of OpenPose, a posture estimation algorithm, for measurement of knee range of motion after total knee arthroplasty (TKA), in comparison to radiography and goniometry. Methods. In this prospective observational study, we analyzed 35 primary TKAs (24 patients) for knee osteoarthritis. We measured the knee angles in flexion and extension using OpenPose, radiography, and goniometry. We assessed the test-retest reliability of each method using intraclass correlation coefficient (1,1). We evaluated the ability to estimate other measurement values from the OpenPose value using linear regression analysis. We used intraclass correlation coefficients (2,1) and Bland–Altman analyses to evaluate the agreement and error between radiography and the other measurements. Results. OpenPose had excellent test-retest reliability (intraclass correlation coefficient (1,1) = 1.000). The R. 2. of all regression models indicated large correlations (0.747 to 0.927). In the flexion position, the intraclass correlation coefficients (2,1) of OpenPose indicated excellent agreement (0.953) with radiography. In the extension position, the intraclass correlation coefficients (2,1) indicated good agreement of OpenPose and radiography (0.815) and moderate agreement of goniometry with radiography (0.593). OpenPose had no systematic error in the flexion position, and a 2.3° fixed error in the extension position, compared to radiography. Conclusion. OpenPose is a reliable and valid tool for measuring flexion and extension positions after TKA. It has better accuracy than goniometry, especially in the extension position. Accurate measurement values can be obtained with low error, high reproducibility, and no contact, independent of the examiner’s skills. Cite this article: Bone Joint Res 2023;12(5):313–320


The Bone & Joint Journal
Vol. 102-B, Issue 8 | Pages 1041 - 1047
1 Aug 2020
Hamoodi Z Singh J Elvey MH Watts AC

Aims. The Wrightington classification system of fracture-dislocations of the elbow divides these injuries into six subtypes depending on the involvement of the coronoid and the radial head. The aim of this study was to assess the reliability and reproducibility of this classification system. Methods. This was a blinded study using radiographs and CT scans of 48 consecutive patients managed according to the Wrightington classification system between 2010 and 2018. Four trauma and orthopaedic consultants, two post CCT fellows, and one speciality registrar based in the UK classified the injuries. The seven observers reviewed preoperative radiographs and CT scans twice, with a minimum four-week interval. Radiographs and CT scans were reviewed separately. Inter- and intraobserver reliability were calculated using Fleiss and Cohen kappa coefficients. The Landis and Koch criteria were used to interpret the strength of the kappa values. Validity was assessed by calculating the percentage agreement against intraoperative findings. Results. Of the 48 patients, three (6%) had type A injury, 11 (23%) type B, 16 (33%) type B+, 16 (33%) Type C, two (4%) type D+, and none had a type D injury. All 48 patients had anteroposterior (AP) and lateral radiographs, 44 had 2D CT scans, and 39 had 3D reconstructions. The interobserver reliability kappa value was 0.52 for radiographs, 0.71 for 2D CT scans, and 0.73 for a combination of 2D and 3D reconstruction CT scans. The median intraobserver reliability was 0.75 (interquartile range (IQR) 0.62 to 0.79) for radiographs, 0.77 (IQR 0.73 to 0.94) for 2D CT scans, and 0.89 (IQR 0.77 to 0.93) for the combination of 2D and 3D reconstruction. Validity analysis showed that accuracy significantly improved when using CT scans (p = 0.018 and p = 0.028 respectively). Conclusion. The Wrightington classification system is a reliable and valid method of classifying fracture-dislocations of the elbow. CT scans are significantly more accurate than radiographs when identifying the pattern of injury, with good intra- and interobserver reproducibility. Cite this article: Bone Joint J 2020;102-B(8):1041–1047


Bone & Joint Research
Vol. 8, Issue 8 | Pages 357 - 366
1 Aug 2019
Zhang B Sun H Zhan Y He Q Zhu Y Wang Y Luo C

Objectives. CT-based three-column classification (TCC) has been widely used in the treatment of tibial plateau fractures (TPFs). In its updated version (updated three-column concept, uTCC), a fracture morphology-based injury mechanism was proposed for effective treatment guidance. In this study, the injury mechanism of TPFs is further explained, and its inter- and intraobserver reliability is evaluated to perfect the uTCC. Methods. The radiological images of 90 consecutive TPF patients were collected. A total of 47 men (52.2%) and 43 women (47.8%) with a mean age of 49.8 years (. sd. 12.4; 17 to 77) were enrolled in our study. Among them, 57 fractures were on the left side (63.3%) and 33 were on the right side (36.7%); no bilateral fracture existed. Four observers were chosen to classify or estimate independently these randomized cases according to the Schatzker classification, TCC, and injury mechanism. With two rounds of evaluation, the kappa values were calculated to estimate the inter- and intrareliability. Results. The overall inter- and intraobserver agreements of the injury mechanism were substantial (κ. inter. = 0.699, κ. intra. = 0.749, respectively). The initial position and the force direction, which are two components of the injury mechanism, had substantial agreement for both inter-reliability or intrareliability. The inter- and intraobserver agreements were lower in high-energy fractures (Schatzker types IV to VI; κ. inter. = 0.605, κ. intra. = 0.721) compared with low-energy fractures (Schatzker types I to III; κ. inter. = 0.81, κ. intra. = 0.832). The inter- and intraobserver agreements were relatively higher in one-column fractures (κ. inter. = 0.759, κ. intra. = 0.801) compared with two-column and three-column fractures. Conclusion. The complete theory of injury mechanism of TPFs was first put forward to make the TCC consummate. It demonstrates substantial inter- and intraobserver agreement generally. Furthermore, the injury mechanism can be promoted clinically. Cite this article: B-B. Zhang, H. Sun, Y. Zhan, Q-F. He, Y. Zhu, Y-K. Wang, C-F. Luo. Reliability and repeatability of tibial plateau fracture assessment with an injury mechanism-based concept. Bone Joint Res 2019;8:357–366. DOI: 10.1302/2046-3758.88.BJR-2018-0331.R1


The Journal of Bone & Joint Surgery British Volume
Vol. 94-B, Issue 1 | Pages 32 - 36
1 Jan 2012
Nho J Lee Y Kim HJ Ha Y Suh Y Koo K

A variety of radiological methods of measuring version of the acetabular component after total hip replacement (THR) have been described. The aim of this study was to evaluate the reliability and validity of six methods (those of Lewinnek; Widmer; Hassan et al; Ackland, Bourne and Uhthoff; Liaw et al; and Woo and Morrey) that are currently in use. In 36 consecutive patients who underwent THR, version of the acetabular component was measured by three independent examiners on plain radiographs using these six methods and compared with measurements using CT scans. The intra- and interobserver reliabilities of each measurement were estimated. All measurements on both radiographs and CT scans had excellent intra- and interobserver reliability and the results from each of the six methods correlated well with the CT measurements. However, measurements made using the methods of Widmer and of Ackland, Bourne and Uhthoff were significantly different from the CT measurements (both p < 0.001), whereas measurements made using the remaining four methods were similar to the CT measurements. With regard to reliability and convergent validity, we recommend the use of the methods described by Lewinnek, Hassan et al, Liaw et al and Woo and Morrey for measurement of version of the acetabular component


Bone & Joint Research
Vol. 5, Issue 8 | Pages 347 - 352
1 Aug 2016
Nuttall J Evaniew N Thornley P Griffin A Deheshi B O’Shea T Wunder J Ferguson P Randall RL Turcotte R Schneider P McKay P Bhandari M Ghert M

Objectives. The diagnosis of surgical site infection following endoprosthetic reconstruction for bone tumours is frequently a subjective diagnosis. Large clinical trials use blinded Central Adjudication Committees (CACs) to minimise the variability and bias associated with assessing a clinical outcome. The aim of this study was to determine the level of inter-rater and intra-rater agreement in the diagnosis of surgical site infection in the context of a clinical trial. Materials and Methods. The Prophylactic Antibiotic Regimens in Tumour Surgery (PARITY) trial CAC adjudicated 29 non-PARITY cases of lower extremity endoprosthetic reconstruction. The CAC members classified each case according to the Centers for Disease Control (CDC) criteria for surgical site infection (superficial, deep, or organ space). Combinatorial analysis was used to calculate the smallest CAC panel size required to maximise agreement. A final meeting was held to establish a consensus. Results. Full or near consensus was reached in 20 of the 29 cases. The Fleiss kappa value was calculated as 0.44 (95% confidence interval (CI) 0.35 to 0.53), or moderate agreement. The greatest statistical agreement was observed in the outcome of no infection, 0.61 (95% CI 0.49 to 0.72, substantial agreement). Panelists reached a full consensus in 12 of 29 cases and near consensus in five of 29 cases when CDC criteria were used (superficial, deep or organ space). A stable maximum Fleiss kappa of 0.46 (95% CI 0.50 to 0.35) at CAC sizes greater than three members was obtained. Conclusions. There is substantial agreement among the members of the PARITY CAC regarding the presence or absence of surgical site infection. Agreement on the level of infection, however, is more challenging. Additional clinical information routinely collected by the prospective PARITY trial may improve the discriminatory capacity of the CAC in the parent study for the diagnosis of infection. Cite this article: J. Nuttall, N. Evaniew, P. Thornley, A. Griffin, B. Deheshi, T. O’Shea, J. Wunder, P. Ferguson, R. L. Randall, R. Turcotte, P. Schneider, P. McKay, M. Bhandari, M. Ghert. The inter-rater reliability of the diagnosis of surgical site infection in the context of a clinical trial. Bone Joint Res 2016;5:347–352. DOI: 10.1302/2046-3758.58.BJR-2016-0036.R1


The Bone & Joint Journal
Vol. 102-B, Issue 4 | Pages 478 - 484
1 Apr 2020
Daniels AM Wyers CE Janzing HMJ Sassen S Loeffen D Kaarsemaker S van Rietbergen B Hannemann PFW Poeze M van den Bergh JP

Aims

Besides conventional radiographs, the use of MRI, CT, and bone scintigraphy is frequent in the diagnosis of a fracture of the scaphoid. However, which techniques give the best results remain unknown. The investigation of a new imaging technique initially requires an analysis of its precision. The primary aim of this study was to investigate the interobserver agreement of high-resolution peripheral quantitative CT (HR-pQCT) in the diagnosis of a scaphoid fracture. A secondary aim was to investigate the interobserver agreement for the presence of other fractures and for the classification of scaphoid fracture.

Methods

Two radiologists and two orthopaedic trauma surgeons evaluated HR-pQCT scans of 31 patients with a clinically-suspected scaphoid fracture. The observers were asked to determine the presence of a scaphoid or other fracture and to classify the scaphoid fracture based on the Herbert classification system. Fleiss kappa statistics were used to calculate the interobserver agreement for the diagnosis of a fracture. Intraclass correlation coefficients (ICCs) were used to assess the agreement for the classification of scaphoid fracture.


Bone & Joint Open
Vol. 3, Issue 11 | Pages 913 - 920
18 Nov 2022
Dean BJF Berridge A Berkowitz Y Little C Sheehan W Riley N Costa M Sellon E

Aims. The evidence demonstrating the superiority of early MRI has led to increased use of MRI in clinical pathways for acute wrist trauma. The aim of this study was to describe the radiological characteristics and the inter-observer reliability of a new MRI based classification system for scaphoid injuries in a consecutive series of patients. Methods. We identified 80 consecutive patients with acute scaphoid injuries at one centre who had presented within four weeks of injury. The radiographs and MRI scans were assessed by four observers, two radiologists, and two hand surgeons, using both pre-existing classifications and a new MRI based classification tool, the Oxford Scaphoid MRI Assessment Rating Tool (OxSMART). The OxSMART was used to categorize scaphoid injuries into three grades: contusion (grade 1); unicortical fracture (grade 2); and complete bicortical fracture (grade 3). Results. In total there were 13 grade 1 injuries, 11 grade 2 injuries, and 56 grade 3 injuries in the 80 consecutive patients. The inter-observer reliability of the OxSMART was substantial (Kappa = 0.711). The inter-observer reliability of detecting an obvious fracture was moderate for radiographs (Kappa = 0.436) and MRI (Kappa = 0.543). Only 52% (29 of 56) of the grade 3 injuries were detected on plain radiographs. There were two complications of delayed union, both of which occurred in patients with grade 3 injuries, who were promptly treated with cast immobilization. There were no complications in the patients with grade 1 and 2 injuries and the majority of these patients were treated with early mobilization as pain allowed. Conclusion. This MRI based classification tool, the OxSMART, is reliable and clinically useful in managing patients with acute scaphoid injuries. Cite this article: Bone Jt Open 2022;3(11):913–920


Bone & Joint Open
Vol. 4, Issue 5 | Pages 363 - 369
22 May 2023
Amen J Perkins O Cadwgan J Cooke SJ Kafchitsas K Kokkinakis M

Aims. Reimers migration percentage (MP) is a key measure to inform decision-making around the management of hip displacement in cerebral palsy (CP). The aim of this study is to assess validity and inter- and intra-rater reliability of a novel method of measuring MP using a smart phone app (HipScreen (HS) app). Methods. A total of 20 pelvis radiographs (40 hips) were used to measure MP by using the HS app. Measurements were performed by five different members of the multidisciplinary team, with varying levels of expertise in MP measurement. The same measurements were repeated two weeks later. A senior orthopaedic surgeon measured the MP on picture archiving and communication system (PACS) as the gold standard and repeated the measurements using HS app. Pearson’s correlation coefficient (r) was used to compare PACS measurements and all HS app measurements and assess validity. Intraclass correlation coefficient (ICC) was used to assess intra- and inter-rater reliability. Results. All HS app measurements (from 5 raters at week 0 and week 2 and PACS rater) showed highly significant correlation with the PACS measurements (p < 0.001). Pearson’s correlation coefficient (r) was constantly over 0.9, suggesting high validity. Correlation of all HS app measures from different raters to each other was significant with r > 0.874 and p < 0.001, which also confirms high validity. Both inter- and intra-rater reliability were excellent with ICC > 0.9. In a 95% confidence interval for repeated measurements, the deviation of each specific measurement was less than 4% MP for single measurer and 5% for different measurers. Conclusion. The HS app provides a valid method to measure hip MP in CP, with excellent inter- and intra-rater reliability across different medical and allied health specialties. This can be used in hip surveillance programmes by interdisciplinary measurers. Cite this article: Bone Jt Open 2023;4(5):363–369


The Bone & Joint Journal
Vol. 103-B, Issue 8 | Pages 1339 - 1344
1 Aug 2021
Jain S Mohrir G Townsend O Lamb JN Palan J Aderinto J Pandit H

Aims. This aim of this study was to assess the reliability and validity of the Unified Classification System (UCS) for postoperative periprosthetic femoral fractures (PFFs) around cemented polished taper-slip (PTS) stems. Methods. Radiographs of 71 patients with a PFF admitted consecutively at two centres between 25 February 2012 and 19 May 2020 were collated by an independent investigator. Six observers (three hip consultants and three trainees) were familiarized with the UCS. Each PFF was classified on two separate occasions, with a mean time between assessments of 22.7 days (16 to 29). Interobserver reliability for more than two observers was assessed using percentage agreement and Fleiss’ kappa statistic. Intraobserver reliability between two observers was calculated with Cohen kappa statistic. Validity was tested on surgically managed UCS type B PFFs where stem stability was documented in operation notes (n = 50). Validity was assessed using percentage agreement and Cohen kappa statistic between radiological assessment and intraoperative findings. Kappa statistics were interpreted using Landis and Koch criteria. All six observers were blinded to operation notes and postoperative radiographs. Results. Interobserver reliability percentage agreement was 58.5% and the overall kappa value was 0.442 (moderate agreement). Lowest kappa values were seen for type B fractures (0.095 to 0.360). The mean intraobserver reliability kappa value was 0.672 (0.447 to 0.867), indicating substantial agreement. Validity percentage agreement was 65.7% and the mean kappa value was 0.300 (0.160 to 0.4400) indicating only fair agreement. Conclusion. This study demonstrates that the UCS is unsatisfactory for the classification of PFFs around PTS stems, and that it has considerably lower reliability and validity than previously described for other stem types. Radiological PTS stem loosening in the presence of PFF is poorly defined and formal intraoperative testing of stem stability is recommended. Cite this article: Bone Joint J 2021;103-B(8):1339–1344


Bone & Joint Open
Vol. 5, Issue 6 | Pages 524 - 531
24 Jun 2024
Woldeyesus TA Gjertsen J Dalen I Meling T Behzadi M Harboe K Djuv A

Aims. To investigate if preoperative CT improves detection of unstable trochanteric hip fractures. Methods. A single-centre prospective study was conducted. Patients aged 65 years or older with trochanteric hip fractures admitted to Stavanger University Hospital (Stavanger, Norway) were consecutively included from September 2020 to January 2022. Radiographs and CT images of the fractures were obtained, and surgeons made individual assessments of the fractures based on these. The assessment was conducted according to a systematic protocol including three classification systems (AO/Orthopaedic Trauma Association (OTA), Evans Jensen (EVJ), and Nakano) and questions addressing specific fracture patterns. An expert group provided a gold-standard assessment based on the CT images. Sensitivities and specificities of surgeons’ assessments were estimated and compared in regression models with correlations for the same patients. Intra- and inter-rater reliability were presented as Cohen’s kappa and Gwet’s agreement coefficient (AC1). Results. We included 120 fractures in 119 patients. Compared to radiographs, CT increased the sensitivity of detecting unstable trochanteric fractures from 63% to 70% (p = 0.028) and from 70% to 76% (p = 0.004) using AO/OTA and EVJ, respectively. Compared to radiographs alone, CT increased the sensitivity of detecting a large posterolateral trochanter major fragment or a comminuted trochanter major fragment from 63% to 76% (p = 0.002) and from 38% to 55% (p < 0.001), respectively. CT improved intra-rater reliability for stability assessment using EVJ (AC1 0.68 to 0.78; p = 0.049) and for detecting a large posterolateral trochanter major fragment (AC1 0.42 to 0.57; p = 0.031). Conclusion. A preoperative CT of trochanteric fractures increased detection of unstable fractures using the AO/OTA and EVJ classification systems. Compared to radiographs, CT improved intra-rater reliability when assessing fracture stability and detecting large posterolateral trochanter major fragments. Cite this article: Bone Jt Open 2024;5(6):524–531


Bone & Joint Research
Vol. 9, Issue 5 | Pages 242 - 249
1 May 2020
Bali K Smit K Ibrahim M Poitras S Wilkin G Galmiche R Belzile E Beaulé PE

Aims. The aim of the current study was to assess the reliability of the Ottawa classification for symptomatic acetabular dysplasia. Methods. In all, 134 consecutive hips that underwent periacetabular osteotomy were categorized using a validated software (Hip2Norm) into four categories of normal, lateral/global, anterior, or posterior. A total of 74 cases were selected for reliability analysis, and these included 44 dysplastic and 30 normal hips. A group of six blinded fellowship-trained raters, provided with the classification system, looked at these radiographs at two separate timepoints to classify the hips using standard radiological measurements. Thereafter, a consensus meeting was held where a modified flow diagram was devised, before a third reading by four raters using a separate set of 74 radiographs took place. Results. Intrarater results per surgeon between Time 1 and Time 2 showed substantial to almost perfect agreement among the raters (κappa = 0.416 to 0.873). With respect to inter-rater reliability, at Time 1 and Time 2 there was substantial agreement overall between all surgeons (Time 1 κappa = 0.619; Time 2 κappa = 0.623). Posterior and anterior rating categories had moderate and fair agreement at Time 1 (posterior κappa = 0.557; anterior κappa = 0.438) and Time 2 (posterior κappa = 0.506; anterior κappa = 0.250), respectively. At Time 3, overall reliability (κappa = 0.687) and posterior and anterior reliability (posterior κappa = 0.579; anterior κappa = 0.521) improved from Time 1 and Time 2. Conclusion. The Ottawa classification system provides a reliable way to identify three categories of acetabular dysplasia that are well-aligned with surgical management. The term ‘borderline dysplasia’ should no longer be used. Cite this article: Bone Joint Res. 2020;9(5):242–249


Bone & Joint Open
Vol. 1, Issue 7 | Pages 355 - 358
7 Jul 2020
Konrads C Gonser C Ahmad SS

Aims. The Oswestry-Bristol Classification (OBC) was recently described as an MRI-based classification tool for the femoral trochlear. The authors demonstrated better inter- and intraobserver agreement compared to the Dejour classification. As the OBC could potentially provide a very useful MRI-based grading system for trochlear dysplasia, it was the aim to determine the inter- and intraobserver reliability of the classification system from the perspective of the non-founder. Methods. Two orthopaedic surgeons independently assessed 50 MRI scans for trochlear dysplasia and classified each according to the OBC. Both observers repeated the assessments after six weeks. The inter- and intraobserver agreement was determined using Cohen’s kappa statistic and S-statistic nominal and linear weights. Results. The OBC with grading into four different trochlear forms showed excellent inter- and intraobserver agreement with a mean kappa of 0.78. Conclusion. The OBC is a simple MRI-based classification system with high inter- and intraobserver reliability. It could present a useful tool for grading the severity of trochlear dysplasia in daily practice. Cite this article: Bone Joint Open 2020;1-7:355–358


The Journal of Bone & Joint Surgery British Volume
Vol. 92-B, Issue 4 | Pages 571 - 575
1 Apr 2010
Clint SA Morris TP Shaw OM Oddy MJ Rudge B Barry M

The databases of the Picture Archiving and Communication Systems of two hospitals were searched and all children who had a lateral radiograph of the ankle during their attendance at the emergency department were identified. In 227 radiographs, Bohler’s and Gissane’s angles were measured on two separate occasions and by two separate authors to allow calculation of inter- and intra-observer variation. Intraclass correlation coefficients were used to assess the reliability of the measurements. For Bohler’s angle the overall inter-observer reliability, the intraclass correlation coefficient was 0.90 and the intra-observer reliability 0.95, giving excellent agreement. This reliability was maintained across the age groups. For Gissane’s angle, inter- and intra-observer reliability was only fair or poor across most age groups. Further analysis of the Bohler’s angle showed a significant variation in the mean angle with age. Contrary to published opinion, the angle is not uniformly lower than that of adults but varies with age, peaking towards the end of the first decade before attaining adult values. The age-related radiologic changes presented here may help in the interpretation of injuries to the hindfoot in children


The Journal of Bone & Joint Surgery British Volume
Vol. 88-B, Issue 8 | Pages 1048 - 1052
1 Aug 2006
Jerosch-Herold C Rosén B Shepstone L

Locognosia, the ability to localise touch, is one aspect of tactile spatial discrimination which relies on the integrity of peripheral end-organs as well as the somatosensory representation of the surface of the body in the brain. The test presented here is a standardised assessment which uses a protocol for testing locognosia in the zones of the hand supplied by the median and/or ulnar nerves. The test-retest reliability and discriminant validity were investigated in 39 patients with injuries to the median or ulnar nerve. Intraclass correlation coefficients were used to calculate the test-retest reliability. Discriminant validity was assessed by comparing the injured with the unaffected hand. Excellent test-retest reliability was demonstrated for the injuries to the median (intraclass correlation coefficient 0.924, 95% confidence interval 0.848 to 1.00) and the ulnar nerves (intraclass correlation coefficient 0.859, 95% confidence interval 0.693 to 1.00). The magnitude of the difference in scores between affected and unaffected hands showed good discriminant validity. For injuries to the median nerve the mean difference was 11.1 points (1 to 33; . sd. 7.4), which was statistically significant (p < 0.0001, paired t-test) and for those of the ulnar nerve it was 4.75 points (1 to 13.5; . sd. 3.16), which was also statistically significant (paired t-test, p < 0.0001). The locognosia test has excellent test-retest reliability, is a valid test of tactile spatial discrimination and should be included in the evaluation of outcome after injury to peripheral nerves


The Journal of Bone & Joint Surgery British Volume
Vol. 91-B, Issue 7 | Pages 903 - 906
1 Jul 2009
Trickett RW Hodgson P Forster MC Robertson A

We aimed to determine the reliability, accuracy and the clinical role of digital templating in the pre-operative work-up for total knee replacement. Initially a sample of ten pre-operative digital radiographs were templated by four independent observers to determine the inter- and intra-observer reliability of the process. Digital templating was then performed on the radiographs of 40 consecutive patients undergoing total knee replacement by a consultant surgeon not involved with the operation, who was blinded to the size of the implant inserted. The Press Fit Condylar Sigma Knee system was used in all the patients. The size of the implant as judged by templating was then compared to that of the size used. Good inter- and intra-observer agreement was demonstrated for both femoral and tibial templating. However, the correct size of the implant was predicted in only 48% of the femoral and 55% of the tibial components. Albeit reproducible, digital templating does not currently predict the correct size of component often enough to be of clinical benefit


The Bone & Joint Journal
Vol. 96-B, Issue 12 | Pages 1669 - 1673
1 Dec 2014
Van der Merwe JM Haddad FS Duncan CP

The Unified Classification System (UCS) was introduced because of a growing need to have a standardised universal classification system of periprosthetic fractures. It combines and simplifies many existing classification systems, and can be applied to any fracture around any partial or total joint replacement occurring during or after operation. Our goal was to assess the inter- and intra-observer reliability of the UCS in association with knee replacement when classifying fractures affecting one or more of the femur, tibia or patella. We used an international panel of ten orthopaedic surgeons with subspecialty fellowship training and expertise in adult hip and knee reconstruction (‘experts’) and ten residents of orthopaedic surgery in the last two years of training (‘pre-experts’). They each received 15 radiographs for evaluation. After six weeks they evaluated the same radiographs again but in a different order. . The reliability was assessed using the Kappa and weighted Kappa values. The Kappa values for inter-observer reliability for the experts and the pre-experts were 0.741 (95% confidence interval (CI) 0.707 to 0.774) and 0.765 (95% CI 0.733 to 0.797), respectively. The weighted Kappa values for intra-observer reliability for the experts and pre-experts were 0.898 (95% CI 0.846 to 0.950) and 0.878 (95% CI 0.815 to 0.942) respectively. The UCS has substantial inter-observer reliability and ‘near perfect’ intra-observer reliability when used for periprosthetic fractures in association with knee replacement in the hands of experienced and inexperienced users. Cite this article: Bone Joint J 2014;96-B:1669–73


The Journal of Bone & Joint Surgery British Volume
Vol. 80-B, Issue 4 | Pages 670 - 672
1 Jul 1998
Flinkkilä T Nikkola-Sihto A Kaarela O Päakkö E Raatikainen T

Interobserver reliability of the AO system of classification of fractures of the distal radius was assessed using plain radiographs and CT. Five observers classified 30 Colles’-type fractures using only plain radiographs; two months later they were reclassified using CT in addition. Interobserver reliability was poor in both series when detailed classification was used. By reducing the categories to five, interobserver reliability was slightly improved, but was still poor. When only two AO types were used, the reliability was moderate using plain radiographs and good to excellent with the addition of CT. The use of CT as well as plain radiographs brings interobserver reliability to a good level in assessment of the presence or absence of articular involvement, but is otherwise of minor value in improving the interobserver reliability of the AO system of classification of fractures of the distal radius


The Bone & Joint Journal
Vol. 98-B, Issue 2 | Pages 166 - 172
1 Feb 2016
Langlois J Hamadouche M

Previous standards for assessing the reliability of a measurement tool have lacked consistency. We reviewed the most current American Society for Testing and Materials and International Organisation for Standardisation (ISO) recommendations, and propose an algorithm for orthopaedic surgeons. When assessing a measurement tool, conditions of the experimental set-up and clear formulae used to compile the results should be strictly reported. According to these recent guidelines, accuracy is a confusing word with an overly broad meaning and should therefore be abandoned. Depending on the experimental conditions, one should be referring to bias (when the study protocol involves accepted reference values), and repeatability (sr, r) or reproducibility (SR, R). In the absence of accepted reference values, only repeatability (sr, r) or reproducibility (SR, R) should be provided. Take home message: Assessing the reliability of a measurement tool involves reporting bias, repeatability and/or reproducibility depending on the defined conditions, instead of precision or accuracy. Cite this article: Bone Joint J 2016;98-B2:166–72


The Bone & Joint Journal
Vol. 97-B, Issue 5 | Pages 611 - 616
1 May 2015
Shin WC Lee SM Lee KW Cho HJ Lee JS Suh KT

There is no single standardised method of measuring the orientation of the acetabular component on plain radiographs after total hip arthroplasty. We assessed the reliability and accuracy of three methods of assessing anteversion of the acetabular component for 551 THAs using the PolyWare software and the methods of Liaw et al, and of Woo and Morrey. All measurements of the three methods had excellent intra- and inter-observer reliability. The values of the PolyWare software, which determines version of the acetabular component by edge detection were regarded as the reference standard. Although the PolyWare software and the method of Liaw et al were similarly precise, the method of Woo and Morrey was significantly less accurate (p < 0.001). The method of Liaw et al seemed to be more accurate than that of Woo and Morrey when compared with the measurements using the PolyWare software. If the qualified lateral radiograph was selected, anteversion measured using the method of Woo and Morrey was considered to be relatively reliable. Cite this article: Bone Joint J 2015; 97-B:611–16


The Journal of Bone & Joint Surgery British Volume
Vol. 88-B, Issue 9 | Pages 1204 - 1206
1 Sep 2006
Malek IA Machani B Mevcha AM Hyder NH

Our aim was to assess the reproducibility and the reliability of the Weber classification system for fractures of the ankle based on anteroposterior and lateral radiographs. Five observers with varying clinical experience reviewed 50 sets of blinded radiographs. The same observers reviewed the same radiographs again after an interval of four weeks. Inter- and intra-observer agreement was assessed based on the proportion of agreement and the values of the kappa coefficient. For inter-observer agreement, the mean kappa value was 0.61 (0.59 to 0.63) and the proportion of agreement was 78% (76% to 79%) and for intra-observer agreement the mean kappa value was 0.74 (0.39 to 0.86) with an 85% (60% to 93%) observed agreement. These results show that the Weber classification of fractures of the ankle based on two radiological views has substantial inter-observer reliability and intra-observer reproducibility


The Journal of Bone & Joint Surgery British Volume
Vol. 91-B, Issue 6 | Pages 766 - 771
1 Jun 2009
Brunner A Honigmann P Treumann T Babst R

We evaluated the impact of stereo-visualisation of three-dimensional volume-rendering CT datasets on the inter- and intraobserver reliability assessed by kappa values on the AO/OTA and Neer classifications in the assessment of proximal humeral fractures. Four independent observers classified 40 fractures according to the AO/OTA and Neer classifications using plain radiographs, two-dimensional CT scans and with stereo-visualised three-dimensional volume-rendering reconstructions. Both classification systems showed moderate interobserver reliability with plain radiographs and two-dimensional CT scans. Three-dimensional volume-rendered CT scans improved the interobserver reliability of both systems to good. Intraobserver reliability was moderate for both classifications when assessed by plain radiographs. Stereo visualisation of three-dimensional volume rendering improved intraobserver reliability to good for the AO/OTA method and to excellent for the Neer classification. These data support our opinion that stereo visualisation of three-dimensional volume-rendering datasets is of value when analysing and classifying complex fractures of the proximal humerus


Bone & Joint Research
Vol. 7, Issue 1 | Pages 36 - 45
1 Jan 2018
Kleinlugtenbelt YV Krol RG Bhandari M Goslings JC Poolman RW Scholtes VAB

Objectives. The patient-rated wrist evaluation (PRWE) and the Disabilities of the Arm, Shoulder and Hand (DASH) questionnaire are patient-reported outcome measures (PROMs) used for clinical and research purposes. Methodological high-quality clinimetric studies that determine the measurement properties of these PROMs when used in patients with a distal radial fracture are lacking. This study aimed to validate the PRWE and DASH in Dutch patients with a displaced distal radial fracture (DRF). Methods. The intraclass correlation coefficient (ICC) was used for test-retest reliability, between PROMs completed twice with a two-week interval at six to eight months after DRF. Internal consistency was determined using Cronbach’s α for the dimensions found in the factor analysis. The measurement error was expressed by the smallest detectable change (SDC). A semi-structured interview was conducted between eight and 12 weeks after DRF to assess the content validity. Results. A total of 119 patients (mean age 58 years (. sd. 15)), 74% female, completed PROMs at a mean time of six months (. sd. 1) post-fracture. One overall meaningful dimension was found for the PRWE and the DASH. Internal consistency was excellent for both PROMs (Cronbach’s α 0.96 (PRWE) and 0.97 (DASH)). Test-retest reliability was good for the PRWE (ICC 0.87) and excellent for the DASH (ICC 0.91). The SDC was 20 for the PRWE and 14 for the DASH. No floor or ceiling effects were found. The content validity was good for both questionnaires. Conclusion. The PRWE and DASH are valid and reliable PROMs in assessing function and disability in Dutch patients with a displaced DRF. However, due to the high SDC, the PRWE and DASH are less useful for individual patients with a distal radial fracture in clinical practice. Cite this article: Y. V. Kleinlugtenbelt, R. G. Krol, M. Bhandari, J. C. Goslings, R. W. Poolman, V. A. B. Scholtes. Are the patient-rated wrist evaluation (PRWE) and the disabilities of the arm, shoulder and hand (DASH) questionnaire used in distal radial fractures truly valid and reliable? Bone Joint Res 2018;7:36–45. DOI: 10.1302/2046-3758.71.BJR-2017-0081.R1


Bone & Joint Research
Vol. 2, Issue 1 | Pages 1 - 8
1 Jan 2013
Costa AJ Lustig S Scholes CJ Balestro J Fatima M Parker DA

Objectives. There remains a lack of data on the reliability of methods to estimate tibial coverage achieved during total knee replacement. In order to address this gap, the intra- and interobserver reliability of a three-dimensional (3D) digital templating method was assessed with one symmetric and one asymmetric prosthesis design. Methods. A total of 120 template procedures were performed according to specific rotational and over-hang criteria by three observers at time zero and again two weeks later. Total and sub-region coverage were calculated and the reliability of the templating and measurement method was evaluated. Results. Excellent intra- and interobserver reliability was observed for total coverage, when minimal component overhang (intraclass correlation coefficient (ICC) = 0.87) or no component overhang (ICC = 0.92) was permitted, regardless of rotational restrictions. Conclusions. Measurement of tibial coverage can be reliable using the templating method described even if the rotational axis selected still has a minor influence


The Journal of Bone & Joint Surgery British Volume
Vol. 84-B, Issue 1 | Pages 42 - 47
1 Jan 2002
Brismar BH Wredmark T Movin T Leandersson J Svensson O

We studied 19 videotaped knee arthroscopies in 19 patients with mild to moderate osteoarthritis (OA) of the knee in order to compare the intraobserver and interobserver reliability and the patterns of disagreement between four orthopaedic surgeons. The classifications of OA of Collins, Outerbridge and the French Society of Arthroscopy were used. Intraobserver and interobserver agreements using kappa measures were 0.42 to 0.66 and 0.43 to 0.49, respectively. Only 6% to 8% of paired intraobserver classifications differed by more than one category. Observer-specific disagreement was evident both within and between observers. A small, but significant, occasional variation was also seen. Although reliability may improve by an analysis of disagreement, it appears that the arthroscopic grading of early osteoarthritic lesions is inexact


The Bone & Joint Journal
Vol. 100-B, Issue 2 | Pages 242 - 246
1 Feb 2018
Ghoshal A Enninghorst N Sisak K Balogh ZJ

Aims. To evaluate interobserver reliability of the Orthopaedic Trauma Association’s open fracture classification system (OTA-OFC). Patients and Methods. Patients of any age with a first presentation of an open long bone fracture were included. Standard radiographs, wound photographs, and a short clinical description were given to eight orthopaedic surgeons, who independently evaluated the injury using both the Gustilo and Anderson (GA) and OTA-OFC classifications. The responses were compared for variability using Cohen’s kappa. Results. The overall interobserver agreement was ĸ = 0.44 for the GA classification and ĸ = 0.49 for OTA-OFC, which reflects moderate agreement (0.41 to 0.60) for both classifications. The agreement in the five categories of OTA-OFC was: for skin, ĸ = 0.55 (moderate); for muscle, ĸ = 0.44 (moderate); for arterial injury, ĸ = 0.74 (substantial); for contamination, ĸ = 0.35 (fair); and for bone loss, ĸ = 0.41 (moderate). Conclusion. Although the OTA-OFC, with similar interobserver agreement to GA, offers a more detailed description of open fractures, further development may be needed to make it a reliable and robust tool. Cite this article: Bone Joint J 2018;100-B:242–6


The Bone & Joint Journal
Vol. 96-B, Issue 5 | Pages 597 - 603
1 May 2014
Nomura T Naito M Nakamura Y Ida T Kuroda D Kobayashi T Sakamoto T Seo H

Several radiological methods of measuring anteversion of the acetabular component after total hip replacement (THR) have been described. These studies used different definitions and reference planes to compare methods, allowing for misinterpretation of the results. We compared the reliability and accuracy of five current methods using plain radiographs (those of Lewinnek, Widmer, Liaw, Pradhan, and Woo and Morrey) with CT measurements, using the same definition and reference plane. We retrospectively studied the plain radiographs and CT scans in 84 hips of 84 patients who underwent primary THR. Intra- and inter-observer reliability were high for the measurement of inclination and anteversion with all methods on plain radiographs and CT scans. The measurements of inclination on plain radiographs were similar to the measurements using CT (p = 0.043). The mean difference between CT measurements was 0.6° (-5.9° to 6.8°). Measurements using Widmer’s method were the most similar to those using CT (p = 0.088), with a mean difference between CT measurements of -0.9° (-10.4° to 9.1°), whereas the other four methods differed significantly from those using CT (p < 0.001). This study has shown that Widmer’s method is the best for evaluating the anteversion of the acetabular component on plain radiographs. Cite this article: Bone Joint J 2014; 96-B:597–603


The Journal of Bone & Joint Surgery British Volume
Vol. 75-B, Issue 3 | Pages 479 - 482
1 May 1993
Dias J Thomas I Lamont A Mody B Thompson

Ultrasound scans were made of the hips of 209 neonates born consecutively over a two-week period. Of the 418 scans, 62 images were selected at random and 25 of these were duplicated to give a total of 87 scans. These static images were then presented to five experienced observers who each made nine different assessments and measurements. Interobserver and intraboserver agreement was calculated and expressed as kappa values. Our results showed poor reliability on both counts


The Journal of Bone & Joint Surgery British Volume
Vol. 79-B, Issue 4 | Pages 570 - 575
1 Jul 1997
Boniforti FG Fujii G Angliss RD Benson MKD

We have evaluated the reliability of the measurement of radiological indicators in developmental dysplasia of the hip. Three observers each independently assessed 60 pelvic radiographs from infants aged from 3 to 36 months. Errors from the true value of a single measurement made by a single observer (E1), of the average of two measurements by a single observer (E2), and of the average of two single measurements by two different observers (E3) were established for the acetabular index of Hilgenreiner, for the assessment of superior and lateral femoral displacement and for indicators of pelvic alignment. The errors for the assessment of the acetabular index were E1 ± 5°, E2 ± 5°, and E3 ± 3.5°. There was a significant correlation between the presence of an acetabular notch on the radiograph and an increased error in measurement (p = 0.01). Yamamuro’s measurement of lateral femoral displacement was more reliable than the Hilgenreiner distance. The errors of indicators of pelvic alignment showed a correlation with the age of the infant; the quotient of pelvic rotation was more reliable after seven months of age (p < 0.0001). The errors of the measurement of the symphysis os-ischium angle tended to increase with age and those of the measurement of the index of pelvic tilt decreased with skeletal maturation (p = 0.002)


The Journal of Bone & Joint Surgery British Volume
Vol. 68-B, Issue 4 | Pages 614 - 615
1 Aug 1986
Christensen F Soballe K Ejsted R Luxhoj T

The reliability of the Catterall grouping of Perthes' disease was examined by determining the agreement between pairs of observers using weighted kappa statistics. Anteroposterior and lateral radiographs of 100 hip joints were grouped independently by four experienced observers. There was a low, and in our opinion, unacceptable degree of inter-observer agreement even when Groups 2 and 3 were combined


The Journal of Bone & Joint Surgery British Volume
Vol. 74-B, Issue 2 | Pages 287 - 291
1 Mar 1992
Wright J Feinstein A


The Journal of Bone & Joint Surgery British Volume
Vol. 72-B, Issue 5 | Pages 924 - 924
1 Sep 1990
Asirvatham R Watts H Ware B Rooney R


The Journal of Bone & Joint Surgery British Volume
Vol. 83-B, Issue 5 | Pages 775 - 777
1 Jul 2001
Rushton N


The Journal of Bone & Joint Surgery British Volume
Vol. 85-B, Issue 3 | Pages 463 - 464
1 Apr 2003
MENCHE DS


The Journal of Bone & Joint Surgery British Volume
Vol. 71-B, Issue 1 | Pages 6 - 8
1 Jan 1989
Broughton N Brougham D Cole W Menelaus M

We investigated the reproducibility of the various radiological methods of assessment of hip dysplasia by making 474 assessments of hips and quantifying the inter-observer and intra-observer variation. There was a wide range of variability between the readings made by different observers and by one observer on two occasions. A measurement of acetabular index has to be given a range of +/- 6 degrees in order to be 95% confident of including the true measurement. We found the most helpful measurements to be the acetabular index, up to the age of eight years; the centre-edge angle, over the age of five years; and Smith's c/b ratio and neck-shaft angle. We feel, however, that the change in value over a series of radiographs in the same child is much more valuable. Single readings of all the radiological measurements investigated in this study were unreliable.


The Bone & Joint Journal
Vol. 97-B, Issue 8 | Pages 1139 - 1143
1 Aug 2015
Hutt JRB Ortega-Briones A Daurka JS Bircher MD Rickman MS

The most widely used classification system for acetabular fractures was developed by Judet, Judet and Letournel over 50 years ago primarily to aid surgical planning. As population demographics and injury mechanisms have altered over time, the fracture patterns also appear to be changing. We conducted a retrospective review of the imaging of 100 patients with a mean age of 54.9 years (19 to 94) and a male to female ratio of 69:31 seen between 2010 and 2013 with acetabular fractures in order to determine whether the current spectrum of injury patterns can be reliably classified using the original system.

Three consultant pelvic and acetabular surgeons and one senior fellow analysed anonymous imaging. Inter-observer agreement for the classification of fractures that fitted into defined categories was substantial, (κ = 0.65, 95% confidence interval (CI) 0.51 to 0.76) with improvement to near perfect on inclusion of CT imaging (κ = 0.80, 95% CI 0.69 to 0.91). However, a high proportion of injuries (46%) were felt to be unclassifiable by more than one surgeon; there was moderate agreement on which these were (κ = 0.42 95% CI 0.31 to 0.54).

Further review of the unclassifiable fractures in this cohort of 100 patients showed that they tended to occur in an older population (mean age 59.1 years; 22 to 94 vs 47.2 years; 19 to 94; p = 0.003) and within this group, there was a recurring pattern of anterior column and quadrilateral plate involvement, with or without an incomplete posterior element injury.

Cite this article: Bone Joint J 2015;97-B:1139–43.


The Journal of Bone & Joint Surgery British Volume
Vol. 94-B, Issue 11 | Pages 1522 - 1528
1 Nov 2012
Wallander H Saebö M Jonsson K Bjönness T Hansson G

We investigated 60 patients (89 feet) with a mean age of 64 years (61 to 67) treated for congenital clubfoot deformity, using standardised weight-bearing radiographs of both feet and ankles together with a functional evaluation. Talocalcaneal and talonavicular relationships were measured and the degree of osteo-arthritic change in the ankle and talonavicular joints was assessed. The functional results were evaluated using a modified Laaveg-Ponseti score. The talocalcaneal (TC) angles in the clubfeet were significantly lower in both anteroposterior (AP) and lateral projections than in the unaffected feet (p < 0.001 for both views). There was significant medial subluxation of the navicular in the clubfeet compared with the unaffected feet (p < 0.001). Severe osteoarthritis in the ankle joint was seen in seven feet (8%) and in the talonavicular joint in 11 feet (12%). The functional result was excellent or good (≥ 80 points) in 29 patients (48%), and fair or poor (< 80 points) in 31 patients (52%). Patients who had undergone few (0 to 1) surgical procedures had better functional outcomes than those who had undergone two or more procedures (p < 0.001). There was a significant correlation between the functional result and the degree of medial subluxation of the navicular (p < 0.001, r2 = 0.164), the talocalcaneal angle on AP projection (p < 0.02, r2 = 0.025) and extent of osteoarthritis in the ankle joint (p < 0.001).

We conclude that poor functional outcome in patients with congenital clubfoot occurs more frequently in those with medial displacement of the navicular, osteoarthritis of the talonavicular and ankle joints, and a low talocalcaneal angle on the AP projection, and in patients who have undergone two or more surgical procedures. However, the ankle joint in these patients appeared relatively resistant to the development of osteoarthritis.


Aims. The purpose of this study was to assess the reliability and responsiveness to hip surgery of a four-point modified Care and Comfort Hypertonicity Questionnaire (mCCHQ) scoring tool in children with cerebral palsy (CP) in Gross Motor Function Classification System (GMFCS) levels IV and V. Methods. This was a population-based cohort study in children with CP from a national surveillance programme. Reliability was assessed from 20 caregivers who completed the mCCHQ questionnaire on two occasions three weeks apart. Test-retest reliability of the mCCHQ was calculated, and responsiveness before and after surgery for a displaced hip was evaluated in a cohort of children. Results. Test-retest reliability for the overall mCCHQ score was good (intraclass correlation coefficient 0.78), and no dimension demonstrated poor reliability. The surgical intervention cohort comprised ten children who had preoperative and postoperative mCCHQ scores at a minimum of six months postoperatively. The mCCHQ tool demonstrated a significant improvement in overall score from preoperative assessment to six-month postoperative follow-up assessment (p < 0.001). Conclusion. The mCCHQ demonstrated responsiveness to intervention and good test-retest reliability. The mCCHQ is proposed as an outcome tool for use within a national surveillance programme for children with CP. Cite this article: Bone Jt Open 2023;4(8):580–583


Bone & Joint Research
Vol. 13, Issue 1 | Pages 19 - 27
5 Jan 2024
Baertl S Rupp M Kerschbaum M Morgenstern M Baumann F Pfeifer C Worlicek M Popp D Amanatullah DF Alt V

Aims. This study aimed to evaluate the clinical application of the PJI-TNM classification for periprosthetic joint infection (PJI) by determining intraobserver and interobserver reliability. To facilitate its use in clinical practice, an educational app was subsequently developed and evaluated. Methods. A total of ten orthopaedic surgeons classified 20 cases of PJI based on the PJI-TNM classification. Subsequently, the classification was re-evaluated using the PJI-TNM app. Classification accuracy was calculated separately for each subcategory (reinfection, tissue and implant condition, non-human cells, and morbidity of the patient). Fleiss’ kappa and Cohen’s kappa were calculated for interobserver and intraobserver reliability, respectively. Results. Overall, interobserver and intraobserver agreements were substantial across the 20 classified cases. Analyses for the variable ‘reinfection’ revealed an almost perfect interobserver and intraobserver agreement with a classification accuracy of 94.8%. The category 'tissue and implant conditions' showed moderate interobserver and substantial intraobserver reliability, while the classification accuracy was 70.8%. For 'non-human cells,' accuracy was 81.0% and interobserver agreement was moderate with an almost perfect intraobserver reliability. The classification accuracy of the variable 'morbidity of the patient' reached 73.5% with a moderate interobserver agreement, whereas the intraobserver agreement was substantial. The application of the app yielded comparable results across all subgroups. Conclusion. The PJI-TNM classification system captures the heterogeneity of PJI and can be applied with substantial inter- and intraobserver reliability. The PJI-TNM educational app aims to facilitate application in clinical practice. A major limitation was the correct assessment of the implant situation. To eliminate this, a re-evaluation according to intraoperative findings is strongly recommended. Cite this article: Bone Joint Res 2024;13(1):19–27


The Bone & Joint Journal
Vol. 105-B, Issue 1 | Pages 56 - 63
1 Jan 2023
de Klerk HH Oosterhoff JHF Schoolmeesters B Nieboer P Eygendaal D Jaarsma RL IJpma FFA van den Bekerom MPJ Doornberg JN

Aims. This study aimed to answer the following questions: do 3D-printed models lead to a more accurate recognition of the pattern of complex fractures of the elbow?; do 3D-printed models lead to a more reliable recognition of the pattern of these injuries?; and do junior surgeons benefit more from 3D-printed models than senior surgeons?. Methods. A total of 15 orthopaedic trauma surgeons (seven juniors, eight seniors) evaluated 20 complex elbow fractures for their overall pattern (i.e. varus posterior medial rotational injury, terrible triad injury, radial head fracture with posterolateral dislocation, anterior (trans-)olecranon fracture-dislocation, posterior (trans-)olecranon fracture-dislocation) and their specific characteristics. First, fractures were assessed based on radiographs and 2D and 3D CT scans; and in a subsequent round, one month later, with additional 3D-printed models. Diagnostic accuracy (acc) and inter-surgeon reliability (κ) were determined for each assessment. Results. Accuracy significantly improved with 3D-printed models for the whole group on pattern recognition (acc. 2D/3D. = 0.62 vs acc. 3Dprint. = 0.69; Δacc = 0.07 (95% confidence interval (CI) 0.00 to 0.14); p = 0.025). A significant improvement was also seen in reliability for pattern recognition with the additional 3D-printed models (κ. 2D/3D. = 0.41 (moderate) vs κ. 3Dprint. = 0.59 (moderate); Δκ = 0.18 (95% CI 0.14 to 0.22); p ≤ 0.001). Accuracy was comparable between junior and senior surgeons with the 3D-printed model (acc. junior. = 0.70 vs acc. senior. = 0.68; Δacc = -0.02 (95% CI -0.17 to 0.13); p = 0.904). Reliability was also comparable between junior and senior surgeons without the 3D-printed model (κ. junior. = 0.39 (fair) vs κ. senior. = 0.43 (moderate); Δκ = 0.03 (95% CI -0.03 to 0.10); p = 0.318). However, junior surgeons showed greater improvement regarding reliability than seniors with 3D-printed models (κ. junior. = 0.65 (substantial) vs κ. senior. = 0.54 (moderate); Δκ = 0.11 (95% CI 0.04 to 0.18); p = 0.002). Conclusion. The use of 3D-printed models significantly improved the accuracy and reliability of recognizing the pattern of complex fractures of the elbow. However, the current long printing time and non-reusable materials could limit the usefulness of 3D-printed models in clinical practice. They could be suitable as a reusable tool for teaching residents. Cite this article: Bone Joint J 2023;105-B(1):56–63


The Bone & Joint Journal
Vol. 106-B, Issue 1 | Pages 19 - 27
1 Jan 2024
Tang H Guo S Ma Z Wang S Zhou Y

Aims. The aim of this study was to evaluate the reliability and validity of a patient-specific algorithm which we developed for predicting changes in sagittal pelvic tilt after total hip arthroplasty (THA). Methods. This retrospective study included 143 patients who underwent 171 THAs between April 2019 and October 2020 and had full-body lateral radiographs preoperatively and at one year postoperatively. We measured the pelvic incidence (PI), the sagittal vertical axis (SVA), pelvic tilt, sacral slope (SS), lumbar lordosis (LL), and thoracic kyphosis to classify patients into types A, B1, B2, B3, and C. The change of pelvic tilt was predicted according to the normal range of SVA (0 mm to 50 mm) for types A, B1, B2, and B3, and based on the absolute value of one-third of the PI-LL mismatch for type C patients. The reliability of the classification of the patients and the prediction of the change of pelvic tilt were assessed using kappa values and intraclass correlation coefficients (ICCs), respectively. Validity was assessed using the overall mean error and mean absolute error (MAE) for the prediction of the change of pelvic tilt. Results. The kappa values were 0.927 (95% confidence interval (CI) 0.861 to 0.992) and 0.945 (95% CI 0.903 to 0.988) for the inter- and intraobserver reliabilities, respectively, and the ICCs ranged from 0.919 to 0.997. The overall mean error and MAE for the prediction of the change of pelvic tilt were -0.3° (SD 3.6°) and 2.8° (SD 2.4°), respectively. The overall absolute change of pelvic tilt was 5.0° (SD 4.1°). Pre- and postoperative values and changes in pelvic tilt, SVA, SS, and LL varied significantly among the five types of patient. Conclusion. We found that the proposed algorithm was reliable and valid for predicting the standing pelvic tilt after THA. Cite this article: Bone Joint J 2024;106-B(1):19–27


The Bone & Joint Journal
Vol. 106-B, Issue 5 | Pages 468 - 474
1 May 2024
d'Amato M Flevas DA Salari P Bornes TD Brenneis M Boettner F Sculco PK Baldini A

Aims. Obtaining solid implant fixation is crucial in revision total knee arthroplasty (rTKA) to avoid aseptic loosening, a major reason for re-revision. This study aims to validate a novel grading system that quantifies implant fixation across three anatomical zones (epiphysis, metaphysis, diaphysis). Methods. Based on pre-, intra-, and postoperative assessments, the novel grading system allocates a quantitative score (0, 0.5, or 1 point) for the quality of fixation achieved in each anatomical zone. The criteria used by the algorithm to assign the score include the bone quality, the size of the bone defect, and the type of fixation used. A consecutive cohort of 245 patients undergoing rTKA from 2012 to 2018 were evaluated using the current novel scoring system and followed prospectively. In addition, 100 first-time revision cases were assessed radiologically from the original cohort and graded by three observers to evaluate the intra- and inter-rater reliability of the novel radiological grading system. Results. At a mean follow-up of 90 months (64 to 130), only two out of 245 cases failed due to aseptic loosening. Intraoperative grading yielded mean scores of 1.87 (95% confidence interval (CI) 1.82 to 1.92) for the femur and 1.96 (95% CI 1.92 to 2.0) for the tibia. Only 3.7% of femoral and 1.7% of tibial reconstructions fell below the 1.5-point threshold, which included the two cases of aseptic loosening. Interobserver reliability for postoperative radiological grading was 0.97 for the femur and 0.85 for the tibia. Conclusion. A minimum score of 1.5 points for each skeletal segment appears to be a reasonable cut-off to define sufficient fixation in rTKA. There were no revisions for aseptic loosening at mid-term follow-up when this fixation threshold was achieved or exceeded. When assessing first-time revisions, this novel grading system has shown excellent intra- and interobserver reliability. Cite this article: Bone Joint J 2024;106-B(5):468–474


Bone & Joint Open
Vol. 3, Issue 6 | Pages 502 - 509
20 Jun 2022
James HK Griffin J Pattison GTR

Aims. To identify a core outcome set of postoperative radiographic measurements to assess technical skill in ankle fracture open reduction internal fixation (ORIF), and to validate these against Van der Vleuten’s criteria for effective assessment. Methods. An e-Delphi exercise was undertaken at a major trauma centre (n = 39) to identify relevant parameters. Feasibility was tested by two authors. Reliability and validity was tested using postoperative radiographs of ankle fracture operations performed by trainees enrolled in an educational trial (IRCTN 20431944). To determine construct validity, trainees were divided into novice (performed < ten cases at baseline) and intermediate groups (performed ≥ ten cases at baseline). To assess concurrent validity, the procedure-based assessment (PBA) was considered the gold standard. The inter-rater and intrarater reliability was tested using a randomly selected subset of 25 cases. Results. Overall, 235 ankle ORIFs were performed by 24 postgraduate year three to five trainees during ten months at nine NHS hospitals in England, UK. Overall, 42 PBAs were completed. The e-Delphi panel identified five ‘final product analysis’ parameters and defined acceptability thresholds: medial clear space (MCS); medial malleolar displacement (MMD); lateral malleolar displacement (LMD); tibiofibular clear space (TFCS) (all in mm); and talocrural angle (TCA) in degrees. Face validity, content validity, and feasibility were excellent. PBA global rating scale scores in this population showed excellent construct validity as continuous (p < 0.001) and categorical (p = 0.001) variables. Concurrent validity of all metrics was poor against PBA score. Intrarater reliability was substantial for all parameters (intraclass correlation coefficient (ICC) > 0.8), and inter-rater reliability was substantial for LMD, MMD, TCA, and moderate (ICC 0.61 to 0.80) for MCS and TFCS. Assessment was time efficient compared to PBA. Conclusion. Assessment of technical skill in ankle fracture surgery using the first postoperative radiograph satisfies the tested Van der Vleuten’s utility criteria for effective assessment. 'Final product analysis' assessment may be useful to assess skill transfer in the simulation-based research setting. Cite this article: Bone Jt Open 2022;3(6):502–509


Bone & Joint Open
Vol. 5, Issue 11 | Pages 962 - 970
4 Nov 2024
Suter C Mattila H Ibounig T Sumrein BO Launonen A Järvinen TLN Lähdeoja T Rämö L

Aims. Though most humeral shaft fractures heal nonoperatively, up to one-third may lead to nonunion with inferior outcomes. The Radiographic Union Score for HUmeral Fractures (RUSHU) was created to identify high-risk patients for nonunion. Our study evaluated the RUSHU’s prognostic performance at six and 12 weeks in discriminating nonunion within a significantly larger cohort than before. Methods. Our study included 226 nonoperatively treated humeral shaft fractures. We evaluated the interobserver reliability and intraobserver reproducibility of RUSHU scoring using intraclass correlation coefficients (ICCs). Additionally, we determined the optimal cut-off thresholds for predicting nonunion using the receiver operating characteristic (ROC) method. Results. The RUSHU demonstrated good interobserver reliability with an ICC of 0.78 (95% CI 0.72 to 0.83) at six weeks and 0.77 (95% CI 0.71 to 0.82) at 12 weeks. Intraobserver reproducibility was good or excellent for all analyses. Area under the curve in the ROC analysis was 0.83 (95% CI 0.77 to 0.88) at six weeks and 0.89 (95% CI 0.84 to 0.93) at 12 weeks, indicating excellent discrimination. The optimal cut-off values for predicting nonunion were ≤ eight points at six weeks and ≤ nine points at 12 weeks, providing the best specificity-sensitivity trade-off. Conclusion. The RUSHU proves to be a reliable and reproducible radiological scoring system that aids in identifying patients at risk of nonunion at both six and 12 weeks post-injury during non-surgical treatment of humeral shaft fractures. The statistically optimal cut-off values for predicting nonunion are ≤ eight at six weeks and ≤ nine points at 12 weeks post-injury


The Bone & Joint Journal
Vol. 105-B, Issue 10 | Pages 1123 - 1130
1 Oct 2023
Donnan M Anderson N Hoq M Donnan L

Aims. The aim of this study was to investigate the agreement in interpretation of the quality of the paediatric hip ultrasound examination, the reliability of geometric and morphological assessment, and the relationship between these measurements. Methods. Four investigators evaluated 60 hip ultrasounds and assessed their quality based the standard plane of Graf et al. They measured geometric parameters, described the morphology of the hip, and assigned the Graf grade of dysplasia. They analyzed one self-selected image and one randomly selected image from the ultrasound series, and repeated the process four weeks later. The intra- and interobserver agreement, and correlations between various parameters were analyzed. Results. In the assessment of quality, there a was moderate to substantial intraobserver agreement for each element investigated, but interobserver agreement was poor. Morphological features showed weak to moderate agreement across all parameters but improved to significant when responses were reduced. The geometric measurements showed nearly perfect agreement, and the relationship between them and the morphological features showed a dose response across all parameters with moderate to substantial correlations. There were strong correlations between geometric measurements. The Graf classification showed a fair to moderate interobserver agreement, and moderate to substantial intraobserver agreement. Conclusion. This investigation into the reliability of the interpretation of hip ultrasound scans identified the difficulties in defining what is a high-quality ultrasound. We confirmed that geometric measurements are reliably interpreted and may be useful as a further measurement of quality. Morphological features are generally poorly interpreted, but a simpler binary classification considerably improves agreement. As there is a clear dose response relationship between geometric and morphological measurements, the importance of morphology in the diagnosis of hip dysplasia should be questioned. Cite this article: Bone Joint J 2023;105-B(10):1123–1130


Bone & Joint Open
Vol. 4, Issue 9 | Pages 689 - 695
7 Sep 2023
Lim KBL Lee NKL Yeo BS Lim VMM Ng SWL Mishra N

Aims. To determine whether side-bending films in scoliosis are assessed for adequacy in clinical practice; and to introduce a novel method for doing so. Methods. Six surgeons and eight radiographers were invited to participate in four online surveys. The generic survey comprised erect and left and right bending radiographs of eight individuals with scoliosis, with an average age of 14.6 years. Respondents were asked to indicate whether each bending film was optimal (adequate) or suboptimal. In the first survey, they were also asked if they currently assessed the adequacy of bending films. A similar second survey was sent out two weeks later, using the same eight cases but in a different order. In the third survey, a guide for assessing bending film adequacy was attached along with the radiographs to introduce the novel T1-45B method, in which the upper endplate of T1 must tilt ≥ 45° from baseline for the study to be considered optimal. A fourth and final survey was subsequently conducted for confirmation. Results. Overall, 12 (86%) of 14 respondents did not use any criteria to assess the bending film adequacy; the remaining two each described a different invalidated method. In total, 12 (86%) of the respondents felt T1-45B was easy to learn and apply. There was fair to substantial intra-rater reliability (k = 0.25 to 0.88) which improved to fair to almost perfect (k = 0.38 to 0.88) post-introduction of the guide. Inter-rater reliability varied considerably among the rater groups but similarly increased following introduction of the guide (k. S1. = 0.19 to 0.34, k. S2. = 0.33 to 0.43 vs k. S3. = 0.49 to 0.5, k. S4. = 0.35 to 0.43). Conclusion. Many surgeons and radiographers do not assess spinal bending films for adequacy. We propose that the change in the plane of the upper endplate of T1 on side-bending can be used in this evaluation. In the T1-45B method, a change of ≥ 45° on side bending qualifies as an adequate bend effort. Cite this article: Bone Jt Open 2023;4(9):689–695


The Bone & Joint Journal
Vol. 106-B, Issue 9 | Pages 964 - 969
1 Sep 2024
Wang YC Song JJ Li TT Yang D Lv ZB Wang ZY Zhang ZM Luo Y

Aims. To propose a new method for evaluating paediatric radial neck fractures and improve the accuracy of fracture angulation measurement, particularly in younger children, and thereby facilitate planning treatment in this population. Methods. Clinical data of 117 children with radial neck fractures in our hospital from August 2014 to March 2023 were collected. A total of 50 children (26 males, 24 females, mean age 7.6 years (2 to 13)) met the inclusion criteria and were analyzed. Cases were excluded for the following reasons: Judet grade I and Judet grade IVb (> 85° angulation) classification; poor radiograph image quality; incomplete clinical information; sagittal plane angulation; severe displacement of the ulna fracture; and Monteggia fractures. For each patient, standard elbow anteroposterior (AP) view radiographs and corresponding CT images were acquired. On radiographs, Angle P (complementary to the angle between the long axis of the radial head and the line perpendicular to the physis), Angle S (complementary to the angle between the long axis of the radial head and the midline through the proximal radial shaft), and Angle U (between the long axis of the radial head and the straight line from the distal tip of the capitellum to the coronoid process) were identified as candidates approximating the true coronal plane angulation of radial neck fractures. On the coronal plane of the CT scan, the angulation of radial neck fractures (CTa) was measured and served as the reference standard for measurement. Inter- and intraobserver reliabilities were assessed by Kappa statistics and intraclass correlation coefficient (ICC). Results. Angle U showed the strongest correlation with CTa (p < 0.001). In the analysis of inter- and intraobserver reliability, Kappa values were significantly higher for Angles S and U compared with Angle P. ICC values were excellent among the three groups. Conclusion. Angle U on AP view was the best substitute for CTa when evaluating radial neck fractures in children. Further studies are required to validate this method. Cite this article: Bone Joint J 2024;106-B(9):964–969


The Bone & Joint Journal
Vol. 106-B, Issue 3 | Pages 293 - 302
1 Mar 2024
Vogt B Lueckingsmeier M Gosheger G Laufer A Toporowski G Antfang C Roedl R Frommer A

Aims. As an alternative to external fixators, intramedullary lengthening nails (ILNs) can be employed for distraction osteogenesis. While previous studies have demonstrated that typical complications of external devices, such as soft-tissue tethering, and pin site infection can be avoided with ILNs, there is a lack of studies that exclusively investigated tibial distraction osteogenesis with motorized ILNs inserted via an antegrade approach. Methods. A total of 58 patients (median age 17 years (interquartile range (IQR) 15 to 21)) treated by unilateral tibial distraction osteogenesis for a median leg length discrepancy of 41 mm (IQR 34 to 53), and nine patients with disproportionate short stature treated by bilateral simultaneous tibial distraction osteogenesis, with magnetically controlled motorized ILNs inserted via an antegrade approach, were retrospectively analyzed. The median follow-up was 37 months (IQR 30 to 51). Outcome measurements were accuracy, precision, reliability, bone healing, complications, and patient-reported outcome assessed by the Limb Deformity-Scoliosis Research Society Score (LD-SRS-30). Results. A median tibial distraction of 44 mm (IQR 31 to 49) was achieved with a mean distraction index of 0.5 mm/day (standard deviation 0.13) and median consolidation index of 41.2 days/cm (IQR 34 to 51). Accuracy, precision, and reliability were 91%, 92%, and 97%, respectively. New temporary range of motion limitations occurred in 51% of segments (34/67). Distraction-related equinus deformity treated by Achilles tendon lengthening was the most common major complication recorded in 16% of segments (11/67). In 95% of patients (55/58) the distraction goal was achieved with 42% unplanned additional interventions per segment (28/67). The median postoperative LD-SRS-30 score was 4.0 (IQR 3.6 to 4.3). Conclusion. Tibial distraction osteogenesis using motorized ILNs inserted via an antegrade approach appears to be a reliable and precise procedure. Temporary joint stiffness of the knee or ankle should be expected in up to every second patient. A high rate and wide range of complications of variable severity should be anticipated. Cite this article: Bone Joint J 2024;106-B(3):293–302


Bone & Joint Research
Vol. 13, Issue 8 | Pages 392 - 400
5 Aug 2024
Barakat A Evans J Gibbons C Singh HP

Aims. The Oxford Shoulder Score (OSS) is a 12-item measure commonly used for the assessment of shoulder surgeries. This study explores whether computerized adaptive testing (CAT) provides a shortened, individually tailored questionnaire while maintaining test accuracy. Methods. A total of 16,238 preoperative OSS were available in the National Joint Registry (NJR) for England, Wales, Northern Ireland, the Isle of Man, and the States of Guernsey dataset (April 2012 to April 2022). Prior to CAT, the foundational item response theory (IRT) assumptions of unidimensionality, monotonicity, and local independence were established. CAT compared sequential item selection with stopping criteria set at standard error (SE) < 0.32 and SE < 0.45 (equivalent to reliability coefficients of 0.90 and 0.80) to full-length patient-reported outcome measure (PROM) precision. Results. Confirmatory factor analysis (CFA) for unidimensionality exhibited satisfactory fit with root mean square standardized residual (RSMSR) of 0.06 (cut-off ≤ 0.08) but not with comparative fit index (CFI) of 0.85 or Tucker-Lewis index (TLI) of 0.82 (cut-off > 0.90). Monotonicity, measured by H value, yielded 0.482, signifying good monotonic trends. Local independence was generally met, with Yen’s Q3 statistic > 0.2 for most items. The median item count for completing the CAT simulation with a SE of 0.32 was 3 (IQR 3 to 12), while for a SE of 0.45 it was 2 (IQR 2 to 6). This constituted only 25% and 16%, respectively, when compared to the 12-item full-length questionnaire. Conclusion. Calibrating IRT for the OSS has resulted in the development of an efficient and shortened CAT while maintaining accuracy and reliability. Through the reduction of redundant items and implementation of a standardized measurement scale, our study highlights a promising approach to alleviate time burden and potentially enhance compliance with these widely used outcome measures. Cite this article: Bone Joint Res 2024;13(8):392–400


Bone & Joint Open
Vol. 3, Issue 11 | Pages 877 - 884
14 Nov 2022
Archer H Reine S Alshaikhsalama A Wells J Kohli A Vazquez L Hummer A DiFranco MD Ljuhar R Xi Y Chhabra A

Aims. Hip dysplasia (HD) leads to premature osteoarthritis. Timely detection and correction of HD has been shown to improve pain, functional status, and hip longevity. Several time-consuming radiological measurements are currently used to confirm HD. An artificial intelligence (AI) software named HIPPO automatically locates anatomical landmarks on anteroposterior pelvis radiographs and performs the needed measurements. The primary aim of this study was to assess the reliability of this tool as compared to multi-reader evaluation in clinically proven cases of adult HD. The secondary aims were to assess the time savings achieved and evaluate inter-reader assessment. Methods. A consecutive preoperative sample of 130 HD patients (256 hips) was used. This cohort included 82.3% females (n = 107) and 17.7% males (n = 23) with median patient age of 28.6 years (interquartile range (IQR) 22.5 to 37.2). Three trained readers’ measurements were compared to AI outputs of lateral centre-edge angle (LCEA), caput-collum-diaphyseal (CCD) angle, pelvic obliquity, Tönnis angle, Sharp’s angle, and femoral head coverage. Intraclass correlation coefficients (ICC) and Bland-Altman analyses were obtained. Results. Among 256 hips with AI outputs, all six hip AI measurements were successfully obtained. The AI-reader correlations were generally good (ICC 0.60 to 0.74) to excellent (ICC > 0.75). There was lower agreement for CCD angle measurement. Most widely used measurements for HD diagnosis (LCEA and Tönnis angle) demonstrated good to excellent inter-method reliability (ICC 0.71 to 0.86 and 0.82 to 0.90, respectively). The median reading time for the three readers and AI was 212 (IQR 197 to 230), 131 (IQR 126 to 147), 734 (IQR 690 to 786), and 41 (IQR 38 to 44) seconds, respectively. Conclusion. This study showed that AI-based software demonstrated reliable radiological assessment of patients with HD with significant interpretation-related time savings. Cite this article: Bone Jt Open 2022;3(11):877–884