Advertisement for orthosearch.org.uk
Results 1 - 20 of 54
Results per page:
The Bone & Joint Journal
Vol. 104-B, Issue 9 | Pages 1011 - 1016
1 Sep 2022
Acem I van de Sande MAJ

Prediction tools are instruments which are commonly used to estimate the prognosis in oncology and facilitate clinical decision-making in a more personalized manner. Their popularity is shown by the increasing numbers of prediction tools, which have been described in the medical literature. Many of these tools have been shown to be useful in the field of soft-tissue sarcoma of the extremities (eSTS). In this annotation, we aim to provide an overview of the available prediction tools for eSTS, provide an approach for clinicians to evaluate the performance and usefulness of the available tools for their own patients, and discuss their possible applications in the management of patients with an eSTS.

Cite this article: Bone Joint J 2022;104-B(9):1011–1016.


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_18 | Pages 55 - 55
14 Nov 2024
Vinco G Ley C Dixon P Grimm B
Full Access

Introduction. The ability to walk over various surfaces such as cobblestones, slopes or stairs is a very patient centric and clinically meaningful mobility outcome. Current wearable sensors only measure step counts or walking speed regardless of such context relevant for assessing gait function. This study aims to improve deep learning (DL) models to classify surfaces of walking by altering and comparing model features and sensor configurations. Method. Using a public dataset, signals from 6 IMUs (Movella DOT) worn on various body locations (trunk, wrist, right/left thigh, right/left shank) of 30 subjects walking on 9 surfaces were analyzed (flat ground, ramps (up/down), stairs (up/down), cobblestones (irregular), grass (soft), banked (left/right)). Two variations of a CNN Bi-directional LSTM model, with different Batch Normalization layer placement (beginning vs end) as well as data reduction to individual sensors (versus combined) were explored and model performance compared in-between and with previous models using F1 scores. Result. The Bi-LSTM architecture improved performance over previous models, especially for subject-wise data splitting and when combining the 6 sensor locations (e.g. F1=0.94 versus 0.77). Placement of the Batch Normalization layer at the beginning, prior to the convolutional layer, enhanced model understanding of participant gait variations across surfaces. Single sensor performance was best on the right shank (F1=0.88). Conclusion. Walking surface detection using wearable IMUs and DL models shows promise for clinically relevant real-world applications, achieving high F1 levels (>0.9) even for subject-wise data splitting enhancing the model applicability in real-world scenarios. Normalization techniques, such as Batch Normalization, seem crucial for optimizing model performance across diverse participant data. Also single-sensor set-ups can give acceptable performance, in particular for specific surface types of potentially high clinical relevance (e.g. stairs, ramps), offering practical and cost-effective solutions with high usability. Future research will focus on collecting ground-truth labeled data to investigate system performance in real-world settings


Orthopaedic Proceedings
Vol. 104-B, Issue SUPP_13 | Pages 60 - 60
1 Dec 2022
Martin RK Wastvedt S Pareek A Persson A Visnes H Fenstad AM Moatshe G Wolfson J Lind M Engebretsen L
Full Access

External validation of machine learning predictive models is achieved through evaluation of model performance on different groups of patients than were used for algorithm development. This important step is uncommonly performed, inhibiting clinical translation of newly developed models. Recently, machine learning was used to develop a tool that can quantify revision risk for a patient undergoing primary anterior cruciate ligament (ACL) reconstruction (https://swastvedt.shinyapps.io/calculator_rev/). The source of data included nearly 25,000 patients with primary ACL reconstruction recorded in the Norwegian Knee Ligament Register (NKLR). The result was a well-calibrated tool capable of predicting revision risk one, two, and five years after primary ACL reconstruction with moderate accuracy. The purpose of this study was to determine the external validity of the NKLR model by assessing algorithm performance when applied to patients from the Danish Knee Ligament Registry (DKLR). The primary outcome measure of the NKLR model was probability of revision ACL reconstruction within 1, 2, and/or 5 years. For the index study, 24 total predictor variables in the NKLR were included and the models eliminated variables which did not significantly improve prediction ability - without sacrificing accuracy. The result was a well calibrated algorithm developed using the Cox Lasso model that only required five variables (out of the original 24) for outcome prediction. For this external validation study, all DKLR patients with complete data for the five variables required for NKLR prediction were included. The five variables were: graft choice, femur fixation device, Knee Injury and Osteoarthritis Outcome Score (KOOS) Quality of Life subscale score at surgery, years from injury to surgery, and age at surgery. Predicted revision probabilities were calculated for all DKLR patients. The model performance was assessed using the same metrics as the NKLR study: concordance and calibration. In total, 10,922 DKLR patients were included for analysis. Average follow-up time or time-to-revision was 8.4 (±4.3) years and overall revision rate was 6.9%. Surgical technique trends (i.e., graft choice and fixation devices) and injury characteristics (i.e., concomitant meniscus and cartilage pathology) were dissimilar between registries. The model produced similar concordance when applied to the DKLR population compared to the original NKLR test data (DKLR: 0.68; NKLR: 0.68-0.69). Calibration was poorer for the DKLR population at one and five years post primary surgery but similar to the NKLR at two years. The NKLR machine learning algorithm demonstrated similar performance when applied to patients from the DKLR, suggesting that it is valid for application outside of the initial patient population. This represents the first machine learning model for predicting revision ACL reconstruction that has been externally validated. Clinicians can use this in-clinic calculator to estimate revision risk at a patient specific level when discussing outcome expectations pre-operatively. While encouraging, it should be noted that the performance of the model on patients undergoing ACL reconstruction outside of Scandinavia remains unknown


The Bone & Joint Journal
Vol. 106-B, Issue 7 | Pages 688 - 695
1 Jul 2024
Farrow L Zhong M Anderson L

Aims. To examine whether natural language processing (NLP) using a clinically based large language model (LLM) could be used to predict patient selection for total hip or total knee arthroplasty (THA/TKA) from routinely available free-text radiology reports. Methods. Data pre-processing and analyses were conducted according to the Artificial intelligence to Revolutionize the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project protocol. This included use of de-identified Scottish regional clinical data of patients referred for consideration of THA/TKA, held in a secure data environment designed for artificial intelligence (AI) inference. Only preoperative radiology reports were included. NLP algorithms were based on the freely available GatorTron model, a LLM trained on over 82 billion words of de-identified clinical text. Two inference tasks were performed: assessment after model-fine tuning (50 Epochs and three cycles of k-fold cross validation), and external validation. Results. For THA, there were 5,558 patient radiology reports included, of which 4,137 were used for model training and testing, and 1,421 for external validation. Following training, model performance demonstrated average (mean across three folds) accuracy, F1 score, and area under the receiver operating curve (AUROC) values of 0.850 (95% confidence interval (CI) 0.833 to 0.867), 0.813 (95% CI 0.785 to 0.841), and 0.847 (95% CI 0.822 to 0.872), respectively. For TKA, 7,457 patient radiology reports were included, with 3,478 used for model training and testing, and 3,152 for external validation. Performance metrics included accuracy, F1 score, and AUROC values of 0.757 (95% CI 0.702 to 0.811), 0.543 (95% CI 0.479 to 0.607), and 0.717 (95% CI 0.657 to 0.778) respectively. There was a notable deterioration in performance on external validation in both cohorts. Conclusion. The use of routinely available preoperative radiology reports provides promising potential to help screen suitable candidates for THA, but not for TKA. The external validation results demonstrate the importance of further model testing and training when confronted with new clinical cohorts. Cite this article: Bone Joint J 2024;106-B(7):688–695


The Bone & Joint Journal
Vol. 106-B, Issue 2 | Pages 203 - 211
1 Feb 2024
Park JH Won J Kim H Kim Y Kim S Han I

Aims. This study aimed to compare the performance of survival prediction models for bone metastases of the extremities (BM-E) with pathological fractures in an Asian cohort, and investigate patient characteristics associated with survival. Methods. This retrospective cohort study included 469 patients, who underwent surgery for BM-E between January 2009 and March 2022 at a tertiary hospital in South Korea. Postoperative survival was calculated using the PATHFx3.0, SPRING13, OPTIModel, SORG, and IOR models. Model performance was assessed with area under the curve (AUC), calibration curve, Brier score, and decision curve analysis. Cox regression analyses were performed to evaluate the factors contributing to survival. Results. The SORG model demonstrated the highest discriminatory accuracy with AUC (0.80 (95% confidence interval (CI) 0.76 to 0.85)) at 12 months. In calibration analysis, the PATHfx3.0 and OPTIModel models underestimated survival, while the SPRING13 and IOR models overestimated survival. The SORG model exhibited excellent calibration with intercepts of 0.10 (95% CI -0.13 to 0.33) at 12 months. The SORG model also had lower Brier scores than the null score at three and 12 months, indicating good overall performance. Decision curve analysis showed that all five survival prediction models provided greater net benefit than the default strategy of operating on either all or no patients. Rapid growth cancer and low serum albumin levels were associated with three-, six-, and 12-month survival. Conclusion. State-of-art survival prediction models for BM-E (PATHFx3.0, SPRING13, OPTIModel, SORG, and IOR models) are useful clinical tools for orthopaedic surgeons in the decision-making process for the treatment in Asian patients, with SORG models offering the best predictive performance. Rapid growth cancer and serum albumin level are independent, statistically significant factors contributing to survival following surgery of BM-E. Further refinement of survival prediction models will bring about informed and patient-specific treatment of BM-E. Cite this article: Bone Joint J 2024;106-B(2):203–211


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_19 | Pages 31 - 31
22 Nov 2024
Yoon S Jutte P Soriano A Sousa R Zijlstra W Wouthuyzen-Bakker M
Full Access

Aim. This study aimed to externally validate promising preoperative PJI prediction models in a recent, multinational European cohort. Method. Three preoperative PJI prediction models (by Tan et al., Del Toro et al., and Bülow et al.) which previously demonstrated high levels of accuracy were selected for validation. A multicenter retrospective observational analysis was performed of patients undergoing total hip arthroplasty (THA) and total knee arthroplasty (TKA) between January 2020 and December 2021 and treated at centers in the Netherlands, Portugal, and Spain. Patient characteristics were compared between our cohort and those used to develop the prediction models. Model performance was assessed through discrimination and calibration. Results. A total of 2684 patients were included of whom 60 developed a PJI (2.2%). Our patient cohort differed from the models’ original cohorts in terms of demographic variables, procedural variables, and the prevalence of comorbidities. The c-statistics for the Tan, Del Toro, and Bülow models were 0.72, 0.69, and 0.72 respectively. Calibration was reasonable, but precise percentage estimates for PJI risk were most accurate for predicted risks up to 3-4%; the Tan model overestimated risks above 4%, while the Del Toro model underestimated risks above 3%. Conclusions. In this multinational cohort study, the Tan, Del Toro, and Bülow PJI prediction models were found to be externally valid for classifying high risk patients for developing a PJI. These models hold promise for clinical application to enhance preoperative patient counseling and targeted prevention strategies. Keywords. Periprosthetic Joint Infection (PJI), High Risk Groups, Prediction Models, Validation, Infection Prevention


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_18 | Pages 57 - 57
14 Nov 2024
Birkholtz F Eken M Boyes A Engelbrecht A
Full Access

Introduction. With advances in artificial intelligence, the use of computer-aided detection and diagnosis in clinical imaging is gaining traction. Typically, very large datasets are required to train machine-learning models, potentially limiting use of this technology when only small datasets are available. This study investigated whether pretraining of fracture detection models on large, existing datasets could improve the performance of the model when locating and classifying wrist fractures in a small X-ray image dataset. This concept is termed “transfer learning”. Method. Firstly, three detection models, namely, the faster region-based convolutional neural network (faster R-CNN), you only look once version eight (YOLOv8), and RetinaNet, were pretrained using the large, freely available dataset, common objects in context (COCO) (330000 images). Secondly, these models were pretrained using an open-source wrist X-ray dataset called “Graz Paediatric Wrist Digital X-rays” (GRAZPEDWRI-DX) on a (1) fracture detection dataset (20327 images) and (2) fracture location and classification dataset (14390 images). An orthopaedic surgeon classified the small available dataset of 776 distal radius X-rays (Arbeidsgmeischaft für Osteosynthesefragen Foundation / Orthopaedic Trauma Association; AO/OTA), on which the models were tested. Result. Detection models without pre-training on the large datasets were the least precise when tested on the small distal radius dataset. The model with the best accuracy to detect and classify wrist fractures was the YOLOv8 model pretrained on the GRAZPEDWRI-DX fracture detection dataset (mean average precision at intersection over union of 50=59.7%). This model showed up to 33.6% improved detection precision compared to the same models with no pre-training. Conclusion. Optimisation of machine-learning models can be challenging when only relatively small datasets are available. The findings of this study support the potential of transfer learning from large datasets to improve model performance in smaller datasets. This is encouraging for wider application of machine-learning technology in medical imaging evaluation, including less common orthopaedic pathologies


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_2 | Pages 19 - 19
2 Jan 2024
Castagno S Birch M van der Schaar M McCaskie A
Full Access

Precision health aims to develop personalised and proactive strategies for predicting, preventing, and treating complex diseases such as osteoarthritis (OA). Due to OA heterogeneity, which makes developing effective treatments challenging, identifying patients at risk for accelerated disease progression is essential for efficient clinical trial design and new treatment target discovery and development. To create a reliable and interpretable precision health tool that predicts rapid knee OA progression over a 2-year period from baseline patient characteristics using an advanced automated machine learning (autoML) framework, “Autoprognosis 2.0”. All available 2-year follow-up periods of 600 patients from the FNIH OA Biomarker Consortium were analysed using “Autoprognosis 2.0” in two separate approaches, with distinct definitions of clinical outcomes: multi-class predictions (categorising disease progression into pain and/or radiographic progression) and binary predictions. Models were developed using a training set of 1352 instances and all available variables (including clinical, X-ray, MRI, and biochemical features), and validated through both stratified 10-fold cross-validation and hold-out validation on a testing set of 339 instances. Model performance was assessed using multiple evaluation metrics. Interpretability analyses were carried out to identify important predictors of progression. Our final models yielded higher accuracy scores for multi-class predictions (AUC-ROC: 0.858, 95% CI: 0.856-0.860) compared to binary predictions (AUC-ROC: 0.717, 95% CI: 0.712-0.722). Important predictors of rapid disease progression included WOMAC scores and MRI features. Additionally, accurate ML models were developed for predicting OA progression in a subgroup of patients aged 65 or younger. This study presents a reliable and interpretable precision health tool for predicting rapid knee OA progression. Our models provide accurate predictions and, importantly, allow specific predictors of rapid disease progression to be identified. Furthermore, the transparency and explainability of our methods may facilitate their acceptance by clinicians and patients, enabling effective translation to clinical practice


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_18 | Pages 105 - 105
14 Nov 2024
Spoo S Garcia F Braun B Cabri J Grimm B
Full Access

Introduction. The objective assessment of shoulder function is important for personalized diagnosis, therapies and evidence-based practice but has been limited by specialized equipment and dedicated movement laboratories. Advances in AI-driven computer vision (CV) using consumer RGB cameras (red-blue-green) and open-source CV models offer the potential for routine clinical use. However, key concepts, evidence, and research gaps have not yet been synthesized to drive clinical translation. This scoping review aims to map related literature. Method. Following the JBI Manual for Evidence Synthesis, a scoping review was conducted on PubMed and Scholar using search terms including “shoulder,” “pose estimation,” “camera”, and others. From 146 initial results, 27 papers focusing on clinical applicability and using consumer cameras were included. Analysis employed a Grounded Theory approach guided iterative refinement. Result. Studies primarily used Microsoft Kinect (infrared-based depth sensing, RGB camera; discontinued) or monocular consumer cameras with open-source CV-models, sometimes supplemented by LiDAR (laser-based depth sensing), wearables or markers. Technical validation studies against gold standards were scarce and too inconsistent for comparison. Larger range of motion (RoM) movements were accurately recorded, but smaller movements, rotations and scapula tracking remained challenging. For instance, one larger validation study comparing shoulder angles during arm raises to a marker-based gold-standard reported Pearson's R = 0.98 and a standard error of 2.4deg. OpenPose and Mediapipe were the most used CV-models. Recent efforts try to improve model performance by training with shoulder specific movements. Conclusion. Low-cost, routine clinical movement analysis to assess shoulder function using consumer cameras and CV seems feasible. It can provide acceptable accuracy for certain movement tasks and larger RoM. Capturing small, hidden or the entirety of shoulder movement requires improvements such as via training models with shoulder specific data or using dual cameras. Technical validation studies require methodological standardization, and clinical validation against established constructs is needed for translation into practice


To examine whether Natural Language Processing (NLP) using a state-of-the-art clinically based Large Language Model (LLM) could predict patient selection for Total Hip Arthroplasty (THA), across a range of routinely available clinical text sources. Data pre-processing and analyses were conducted according to the Ai to Revolutionise the patient Care pathway in Hip and Knee arthroplasty (ARCHERY) project protocol (. https://www.researchprotocols.org/2022/5/e37092/. ). Three types of deidentified Scottish regional clinical free text data were assessed: Referral letters, radiology reports and clinic letters. NLP algorithms were based on the GatorTron model, a Bidirectional Encoder Representations from Transformers (BERT) based LLM trained on 82 billion words of de-identified clinical text. Three specific inference tasks were performed: assessment of the base GatorTron model, assessment after model-fine tuning, and external validation. There were 3911, 1621 and 1503 patient text documents included from the sources of referral letters, radiology reports and clinic letters respectively. All letter sources displayed significant class imbalance, with only 15.8%, 24.9%, and 5.9% of patients linked to the respective text source documentation having undergone surgery. Untrained model performance was poor, with F1 scores (harmonic mean of precision and recall) of 0.02, 0.38 and 0.09 respectively. This did however improve with model training, with mean scores (range) of 0.39 (0.31–0.47), 0.57 (0.48–0.63) and 0.32 (0.28–0.39) across the 5 folds of cross-validation. Performance deteriorated on external validation across all three groups but remained highest for the radiology report cohort. Even with further training on a large cohort of routinely collected free-text data a clinical LLM fails to adequately perform clinical inference in NLP tasks regarding identification of those selected to undergo THA. This likely relates to the complexity and heterogeneity of free-text information and the way that patients are determined to be surgical candidates


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_16 | Pages 23 - 23
17 Nov 2023
Castagno S Birch M van der Schaar M McCaskie A
Full Access

Abstract. Introduction. Precision health aims to develop personalised and proactive strategies for predicting, preventing, and treating complex diseases such as osteoarthritis (OA), a degenerative joint disease affecting over 300 million people worldwide. Due to OA heterogeneity, which makes developing effective treatments challenging, identifying patients at risk for accelerated disease progression is essential for efficient clinical trial design and new treatment target discovery and development. Objectives. This study aims to create a trustworthy and interpretable precision health tool that predicts rapid knee OA progression based on baseline patient characteristics using an advanced automated machine learning (autoML) framework, “Autoprognosis 2.0”. Methods. All available 2-year follow-up periods of 600 patients from the FNIH OA Biomarker Consortium were analysed using “Autoprognosis 2.0” in two separate approaches, with distinct definitions of clinical outcomes: multi-class predictions (categorising patients into non-progressors, pain-only progressors, radiographic-only progressors, and both pain and radiographic progressors) and binary predictions (categorising patients into non-progressors and progressors). Models were developed using a training set of 1352 instances and all available variables (including clinical, X-ray, MRI, and biochemical features), and validated through both stratified 10-fold cross-validation and hold-out validation on a testing set of 339 instances. Model performance was assessed using multiple evaluation metrics, such as AUC-ROC, AUC-PRC, F1-score, precision, and recall. Additionally, interpretability analyses were carried out to identify important predictors of rapid disease progression. Results. Our final models yielded high accuracy scores for both multi-class predictions (AUC-ROC: 0.858, 95% CI: 0.856–0.860; AUC-PRC: 0.675, 95% CI: 0.671–0.679; F1-score: 0.560, 95% CI: 0.554–0.566) and binary predictions (AUC-ROC: 0.717, 95% CI: 0.712–0.722; AUC-PRC: 0.620, 95% CI: 0.616–0.624; F1-score: 0.676, 95% CI: 0.673–0679). Important predictors of rapid disease progression included the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) scores and MRI features. Our models were further successfully validated using a hold-out dataset, which was previously omitted from model development and training (AUC-ROC: 0.877 for multi-class predictions; AUC-ROC: 0.746 for binary predictions). Additionally, accurate ML models were developed for predicting OA progression in a subgroup of patients aged 65 or younger (AUC-ROC: 0.862, 95% CI: 0.861–0.863 for multi-class predictions; AUC-ROC: 0.736, 95% CI: 0.734–0.738 for binary predictions). Conclusions. This study presents a reliable and interpretable precision health tool for predicting rapid knee OA progression using “Autoprognosis 2.0”. Our models provide accurate predictions and offer insights into important predictors of rapid disease progression. Furthermore, the transparency and interpretability of our methods may facilitate their acceptance by clinicians and patients, enabling effective utilisation in clinical practice. Future work should focus on refining these models by increasing the sample size, integrating additional features, and using independent datasets for external validation. Declaration of Interest. (b) declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported:I declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research project


Orthopaedic Proceedings
Vol. 104-B, Issue SUPP_4 | Pages 29 - 29
1 Apr 2022
Pettit MH Hickman S Malviya A Khanduja V
Full Access

Identification of patients at risk of not achieving minimally clinically important differences (MCID) in patient reported outcome measures (PROMs) is important to ensure principled and informed pre-operative decision making. Machine learning techniques may enable the generation of a predictive model for attainment of MCID in hip arthroscopy. Aims: 1) to determine whether machine learning techniques could predict which patients will achieve MCID in the iHOT-12 PROM 6 months after arthroscopic management of femoroacetabular impingement (FAI), 2) to determine which factors contribute to their predictive power. Data from the UK Non-Arthroplasty Hip Registry database was utilised. We identified 1917 patients who had undergone hip arthroscopy for FAI with both baseline and 6 month follow up iHOT-12 and baseline EQ-5D scores. We trained three established machine learning algorithms on our dataset to predict an outcome of iHOT-12 MCID improvement at 6 months given baseline characteristics including demographic factors, disease characteristics and PROMs. Performance was assessed using area under the receiver operating characteristic (AUROC) statistics with 5-fold cross validation. The three machine learning algorithms showed quite different performance. The linear logistic regression model achieved AUROC = 0.59, the deep neural network achieved AUROC = 0.82, while a random forest model had the best predictive performance with AUROC 0.87. Of demographic factors, we found that BMI and age were key predictors for this model. We also found that removing all features except baseline responses to the iHOT-12 questionnaire had little effect on performance for the random forest model (AUROC = 0.85). Disease characteristics had little effect on model performance. Machine learning models are able to predict with good accuracy 6-month post-operative MCID attainment in patients undergoing arthroscopic management for FAI. Baseline scores from the iHOT-12 questionnaire are sufficient to predict with good accuracy whether a patient is likely to reach MCID in post-operative PROMs


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_10 | Pages 8 - 8
1 Oct 2020
Wyles CC Maradit-Kremers H Rouzrokh P Barman P Larson DR Polley EC Lewallen DG Berry DJ Pagnano MW Taunton MJ Trousdale RT Sierra RJ
Full Access

Introduction. Instability remains a common complication following total hip arthroplasty (THA) and continues to account for the highest percentage of revisions in numerous registries. Many risk factors have been described, yet a patient-specific risk assessment tool remains elusive. The purpose of this study was to apply a machine learning algorithm to develop a patient-specific risk score capable of dynamic adjustment based on operative decisions. Methods. 22,086 THA performed between 1998–2018 were evaluated. 632 THA sustained a postoperative dislocation (2.9%). Patients were robustly characterized based on non-modifiable factors: demographics, THA indication, spinal disease, spine surgery, neurologic disease, connective tissue disease; and modifiable operative decisions: surgical approach, femoral head size, acetabular liner (standard/elevated/constrained/dual-mobility). Models were built with a binary outcome (event/no event) at 1-year and 5-year postoperatively. Inverse Probability Censoring Weighting accounted for censoring bias. An ensemble algorithm was created that included Generalized Linear Model, Generalized Additive Model, Lasso Penalized Regression, Kernel-Based Support Vector Machines, Random Forest and Optimized Gradient Boosting Machine. Convex combination of weights minimized the negative binomial log-likelihood loss function. Ten-fold cross-validation accounted for the rarity of dislocation events. Results. The 1-year model achieved an area under the curve (AUC)=0.63, sensitivity=70%, specificity=50%, positive predictive value (PPV)=3% and negative predictive value (NPV)=99%. The 5-year model achieved an AUC=0.62, sensitivity=69%, specificity=51%, PPV=7% and NPV=97%. All cohort-level accuracy metrics performed better than chance. The two most influential predictors in the model were surgical approach and acetabular liner. Conclusions. This machine learning algorithm demonstrates high sensitivity and NPV, suggesting screening tool utility. The model is strengthened by a multivariable dataset portending differential dislocation risk. Two modifiable variables (approach and acetabular liner) were the most influential in dislocation risk. Calculator utilization in “app” form could enable individualized risk prognostication. Furthermore, algorithm development through machine learning facilitates perpetual model performance enhancement with future data input


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_6 | Pages 49 - 49
2 May 2024
Green J Khanduja V Malviya A
Full Access

Femoroacetabular Impingement (FAI) syndrome, characterised by abnormal hip contact causing symptoms and osteoarthritis, is measured using the International Hip Outcome Tool (iHOT). This study uses machine learning to predict patient outcomes post-treatment for FAI, focusing on achieving a minimally clinically important difference (MCID) at 52 weeks. A retrospective analysis of 6133 patients from the NAHR who underwent hip arthroscopic treatment for FAI between November 2013 and March 2022 was conducted. MCID was defined as half a standard deviation (13.61) from the mean change in iHOT score at 12 months. SKLearn Maximum Absolute Scaler and Logistic Regression were applied to predict achieving MCID, using baseline and 6-month follow-up data. The model's performance was evaluated by accuracy, area under the curve, and recall, using pre-operative and up to 6-month postoperative variables. A total of 23.1% (1422) of patients completed both baseline and 1-year follow-up iHOT surveys. The best results were obtained using both pre and postoperative variables. The machine learning model achieved 88.1% balanced accuracy, 89.6% recall, and 92.3% AUC. Sensitivity was 83.7% and specificity 93.5%. Key variables determining outcomes included MCID achievement at 6 months, baseline iHOT score, 6-month iHOT scores for pain, and difficulty in walking or using stairs. The study confirmed the utility of machine learning in predicting long-term outcomes following arthroscopic treatment for FAI. MCID, based on the iHOT 12 tools, indicates meaningful clinical changes. Machine learning demonstrated high accuracy and recall in distinguishing between patients achieving MCID and those who did not. This approach could help early identification of patients at risk of not meeting the MCID threshold one year after treatment


The Bone & Joint Journal
Vol. 106-B, Issue 11 | Pages 1348 - 1360
1 Nov 2024
Spek RWA Smith WJ Sverdlov M Broos S Zhao Y Liao Z Verjans JW Prijs J To M Åberg H Chiri W IJpma FFA Jadav B White J Bain GI Jutte PC van den Bekerom MPJ Jaarsma RL Doornberg JN

Aims. The purpose of this study was to develop a convolutional neural network (CNN) for fracture detection, classification, and identification of greater tuberosity displacement ≥ 1 cm, neck-shaft angle (NSA) ≤ 100°, shaft translation, and articular fracture involvement, on plain radiographs. Methods. The CNN was trained and tested on radiographs sourced from 11 hospitals in Australia and externally validated on radiographs from the Netherlands. Each radiograph was paired with corresponding CT scans to serve as the reference standard based on dual independent evaluation by trained researchers and attending orthopaedic surgeons. Presence of a fracture, classification (non- to minimally displaced; two-part, multipart, and glenohumeral dislocation), and four characteristics were determined on 2D and 3D CT scans and subsequently allocated to each series of radiographs. Fracture characteristics included greater tuberosity displacement ≥ 1 cm, NSA ≤ 100°, shaft translation (0% to < 75%, 75% to 95%, > 95%), and the extent of articular involvement (0% to < 15%, 15% to 35%, or > 35%). Results. For detection and classification, the algorithm was trained on 1,709 radiographs (n = 803), tested on 567 radiographs (n = 244), and subsequently externally validated on 535 radiographs (n = 227). For characterization, healthy shoulders and glenohumeral dislocation were excluded. The overall accuracy for fracture detection was 94% (area under the receiver operating characteristic curve (AUC) = 0.98) and for classification 78% (AUC 0.68 to 0.93). Accuracy to detect greater tuberosity fracture displacement ≥ 1 cm was 35.0% (AUC 0.57). The CNN did not recognize NSAs ≤ 100° (AUC 0.42), nor fractures with ≥ 75% shaft translation (AUC 0.51 to 0.53), or with ≥ 15% articular involvement (AUC 0.48 to 0.49). For all objectives, the model’s performance on the external dataset showed similar accuracy levels. Conclusion. CNNs proficiently rule out proximal humerus fractures on plain radiographs. Despite rigorous training methodology based on CT imaging with multi-rater consensus to serve as the reference standard, artificial intelligence-driven classification is insufficient for clinical implementation. The CNN exhibited poor diagnostic ability to detect greater tuberosity displacement ≥ 1 cm and failed to identify NSAs ≤ 100°, shaft translations, or articular fractures. Cite this article: Bone Joint J 2024;106-B(11):1348–1360


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_8 | Pages 79 - 79
1 Aug 2020
Bozzo A Ghert M Reilly J
Full Access

Advances in cancer therapy have prolonged patient survival even in the presence of disseminated disease and an increasing number of cancer patients are living with metastatic bone disease (MBD). The proximal femur is the most common long bone involved in MBD and pathologic fractures of the femur are associated with significant morbidity, mortality and loss of quality of life (QoL). Successful prophylactic surgery for an impending fracture of the proximal femur has been shown in multiple cohort studies to result in longer survival, preserved mobility, lower transfusion rates and shorter post-operative hospital stays. However, there is currently no optimal method to predict a pathologic fracture. The most well-known tool is Mirel's criteria, established in 1989 and is limited from guiding clinical practice due to poor specificity and sensitivity. The ideal clinical decision support tool will be of the highest sensitivity and specificity, non-invasive, generalizable to all patients, and not a burden on hospital resources or the patient's time. Our research uses novel machine learning techniques to develop a model to fill this considerable gap in the treatment pathway of MBD of the femur. The goal of our study is to train a convolutional neural network (CNN) to predict fracture risk when metastatic bone disease is present in the proximal femur. Our fracture risk prediction tool was developed by analysis of prospectively collected data of consecutive MBD patients presenting from 2009–2016. Patients with primary bone tumors, pathologic fractures at initial presentation, and hematologic malignancies were excluded. A total of 546 patients comprising 114 pathologic fractures were included. Every patient had at least one Anterior-Posterior X-ray and clinical data including patient demographics, Mirel's criteria, tumor biology, all previous radiation and chemotherapy received, multiple pain and function scores, medications and time to fracture or time to death. We have trained a convolutional neural network (CNN) with AP X-ray images of 546 patients with metastatic bone disease of the proximal femur. The digital X-ray data is converted into a matrix representing the color information at each pixel. Our CNN contains five convolutional layers, a fully connected layers of 512 units and a final output layer. As the information passes through successive levels of the network, higher level features are abstracted from the data. The model converges on two fully connected deep neural network layers that output the risk of fracture. This prediction is compared to the true outcome, and any errors are back-propagated through the network to accordingly adjust the weights between connections, until overall prediction accuracy is optimized. Methods to improve learning included using stochastic gradient descent with a learning rate of 0.01 and a momentum rate of 0.9. We used average classification accuracy and the average F1 score across five test sets to measure model performance. We compute F1 = 2 x (precision x recall)/(precision + recall). F1 is a measure of a model's accuracy in binary classification, in our case, whether a lesion would result in pathologic fracture or not. Our model achieved 88.2% accuracy in predicting fracture risk across five-fold cross validation testing. The F1 statistic is 0.87. This is the first reported application of convolutional neural networks, a machine learning algorithm, to this important Orthopaedic problem. Our neural network model was able to achieve reasonable accuracy in classifying fracture risk of metastatic proximal femur lesions from analysis of X-rays and clinical information. Our future work will aim to externally validate this algorithm on an international cohort


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_6 | Pages 125 - 125
1 Jul 2020
Chen T Camp M Tchoukanov A Narayanan U Lee J
Full Access

Technology within medicine has great potential to bring about more accessible, efficient, and a higher quality delivery of care. Paediatric supracondylar fractures are the most common elbow fracture in children and at our institution often have high rates of unnecessary long term clinical follow-up, leading to an inefficient use of healthcare and patient resources. This study aims to evaluate patient and clinical factors that significantly predict necessity for further clinical visits following closed reduction and percutaneous pinning. A total of 246 children who underwent closed reduction and percutaneous pinning following supracondylar humerus fractures were prospectively enrolled over a two year period. Patient demographics, perioperative course, goniometric measurements, functional outcome measures, clinical assessment and decision making for further follow up were assessed. Categorical and continuous variables were analyzed and screened for significance via bivariate regression. Significant covariates were used to develop a predictive model through multivariate logistical regression. A probability cut-off was determined on the Receiver Operator Characteristic (ROC) curve using the Youden index to maximize sensitivity and specificity. The regression model performance was then prospectively tested against 22 patients in a blind comparison to evaluate accuracy. 246 paediatrics patients were collected, with 29 cases requiring further follow up past the three month visit. Significant predictive factors for follow up were residual nerve palsy (p < 0 .001) and maximum active flexion angle of injured elbow (p < 0 .001). Insignificant factors included other goniometric measures, subjective evaluations, and functional outcomes scores. The probability of requiring further clinical follow up at the 3 month post-op point can be estimated with the equation: logit(follow-up) = 11.319 + 5.518(nerve palsy) − 0.108(maximum active flexion). Goodness of fit of the model was verified with Nagelkerke R2 = 0.574 and Hosmer & Lemeshow chi-square (p = 0.739). Area Under Curve of the ROC curve was C = 0.919 (SE = 0.035, 95% CI 0.850 – 0.988). Using Youden's Index, a cut-off for probability of follow up was set at 0.094 with the overall sensitivity and specificity maximized to 86.2% and 88% respectively. Using this model and cohort, 194 three month clinic visits would have been deemed medically unnecessary. Preliminary blind prospective testing against the 22 patient cohort demonstrates a model sensitivity and specificity at 100% and 75% respectively, correctly deeming 15 visits unnecessary. Virtual clinics and automated clinical decision making can improve healthcare inefficiencies, unclog clinic wait times, and ultimately enhance quality of care delivery. Our regression model is highly accurate in determining medical necessity for physician examination at the three month visit following supracondylar fracture closed reduction and percutaneous pinning. When applied correctly, there is potential for significant reductions in health care expenditures and in the economic burden on patient families by removing unnecessary visits. In light of positive patient and family receptiveness toward technology, our promising findings and predictive model may pave the way for remote health care delivery, virtual clinics, and automated clinical decision making


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_7 | Pages 96 - 96
1 Jul 2020
Bozzo A Ghert M
Full Access

Advances in cancer therapy have prolonged cancer patient survival even in the presence of disseminated disease and an increasing number of cancer patients are living with metastatic bone disease (MBD). The proximal femur is the most common long bone involved in MBD and pathologic fractures of the femur are associated with significant morbidity, mortality and loss of quality of life (QoL). Successful prophylactic surgery for an impending fracture of the proximal femur has been shown in multiple cohort studies to result in patients more likely to walk after surgery, longer survival, lower transfusion rates and shorter post-operative hospital stays. However, there is currently no optimal method to predict a pathologic fracture. The most well-known tool is Mirel's criteria, established in 1989 and is limited from guiding clinical practice due to poor specificity and sensitivity. The goal of our study is to train a convolutional neural network (CNN) to predict fracture risk when metastatic bone disease is present in the proximal femur. Our fracture risk prediction tool was developed by analysis of prospectively collected data for MBD patients (2009–2016) in order to determine which features are most commonly associated with fracture. Patients with primary bone tumors, pathologic fractures at initial presentation, and hematologic malignancies were excluded. A total of 1146 patients comprising 224 pathologic fractures were included. Every patient had at least one Anterior-Posterior X-ray. The clinical data includes patient demographics, tumor biology, all previous radiation and chemotherapy received, multiple pain and function scores, medications and time to fracture or time to death. Each of Mirel's criteria has been further subdivided and recorded for each lesion. We have trained a convolutional neural network (CNN) with X-ray images of 1146 patients with metastatic bone disease of the proximal femur. The digital X-ray data is converted into a matrix representing the color information at each pixel. Our CNN contains five convolutional layers, a fully connected layers of 512 units and a final output layer. As the information passes through successive levels of the network, higher level features are abstracted from the data. This model converges on two fully connected deep neural network layers that output the fracture risk. This prediction is compared to the true outcome, and any errors are back-propagated through the network to accordingly adjust the weights between connections. Methods to improve learning included using stochastic gradient descent with a learning rate of 0.01 and a momentum rate of 0.9. We used average classification accuracy and the average F1 score across test sets to measure model performance. We compute F1 = 2 x (precision x recall)/(precision + recall). F1 is a measure of a test's accuracy in binary classification, in our case, whether a lesion would result in pathologic fracture or not. Five-fold cross validation testing of our fully trained model revealed accurate classification for 88.2% of patients with metastatic bone disease of the proximal femur. The F1 statistic is 0.87. This represents a 24% error reduction from using Mirel's criteria alone to classify the risk of fracture in this cohort. This is the first reported application of convolutional neural networks, a machine learning algorithm, to an important Orthopaedic problem. Our neural network model was able to achieve impressive accuracy in classifying fracture risk of metastatic proximal femur lesions from analysis of X-rays and clinical information. Our future work will aim to validate this algorithm on an external cohort


Orthopaedic Proceedings
Vol. 98-B, Issue SUPP_20 | Pages 39 - 39
1 Nov 2016
Vallières M Freeman C Zaki A Turcotte R Hickeson M Skamene S Jeyaseelan K Hathout L Serban M Xing S Powell T Goulding K Seuntjens J Levesque I El Naqa I
Full Access

This is quite an innovative study that should lead to a multicentre validation trial. We have developed an FDG-PET/MRI texture-based model for the prediction of lung metastases (LM) in newly diagnosed patients with soft-tissue sarcomas (STSs) using retrospective analysis. In this work, we assess the model performance using a new prospective STS cohort. We also investigate whether incorporating hypoxia and perfusion biomarkers derived from FMISO-PET and DCE-MRI scans can further enhance the predictive power of the model. A total of 66 patients with histologically confirmed STSs were used in this study and divided into two groups: a retrospective cohort of 51 patients (19 LM) used for training the model, and a prospective cohort of 15 patients (two patients with LM, one patient with bone metastases and suspicious lung nodules) for testing the model. In the training phase, a model of four texture features characterising tumour sub-region size and intensity heterogeneities was developed for LM prediction from pre-treatment FDG-PET and MRI scans (T1-weighted, T2-weighted with fat saturation) of the retrospective cohort, using imbalance-adjusted bootstrap statistical resampling and logistic regression multivariable modeling. In the testing phase, this multivariable model was applied to predict the distant metastasis status of the prospective cohort. The predictive power of the obtained model response was assessed using the area under the receiver-operating characteristic curve (AUC). In the exploratory phase of the study, we extracted two heterogeneity metrics from the prospective cohort: the area under the intensity-volume histogram of pre-treatment DCE-MRI volume transfer constant parametric maps and FMISO-PET hypoxia maps (AU-IVH-Ktrans, AU-IVH-FMISO). The impact of the addition of these two individual metrics to the texture-based model response obtained in the testing phase was first investigated using Spearman's correlation (rs), and lastly using logistic regression and leave-one-out cross-validation (LOO-CV) to account for overfitting bias. First, the texture-based model reached an AUC of 0.94, a sensitivity of 1, a specificity of 0.83 and an accuracy of 0.87 when tested in the prospective cohort. In the exploratory phase, the addition of AU-IVH-FMISO did not improve predictive power, yielding a correlation of rs = −0.42 (p = 0.12) with lung metastases, and a relative change in validation AUC of 0% in comparison with the texture-based model response alone in LOO-CV experiments. In contrast, the addition of AU-IVH-Ktrans improved predictive power, yielding a correlation of rs = −0.54 (p = 0.04) with lung metastases, and a change in validation AUC of +10%. Our results demonstrate that texture-based models extracted from pre-treatment FDG-PET and MRI anatomical scans could be successfully used to predict distant metastases in STS cancer. Our results also suggest that the addition of perfusion heterogeneity metrics may contribute to improving model prediction performance


Full Access

Background. The advent of value-based conscientiousness and rapid-recovery discharge pathways presents surgeons, hospitals, and payers with the challenge of providing the same total hip arthroplasty episode of care in the safest and most economic fashion for the same fee, despite patient differences. Various predictive analytic techniques have been applied to medical risk models, such as sepsis risk scores, but none have been applied or validated to the elective primary total hip arthroplasty (THA) setting for key payment-based metrics. The objective of this study was to develop and validate a predictive machine learning model using preoperative patient demographics for length of stay (LOS) after primary THA as the first step in identifying a patient-specific payment model (PSPM). Methods. Using 229,945 patients undergoing primary THA for osteoarthritis from an administrative database between 2009– 16, we created a naïve Bayesian model to forecast LOS after primary THA using a 3:2 split in which 60% of the available patient data “built” the algorithm and the remaining 40% of patients were used for “testing.” This process was iterated five times for algorithm refinement, and model performance was determined using the area under the receiver operating characteristic curve (AUC), percent accuracy, and positive predictive value. LOS was either grouped as 1–5 days or greater than 5 days. Results. The machine learning model algorithm required age, race, gender, and two comorbidity scores (“risk of illness” and “risk of morbidity”) to demonstrate excellent validity, reliability, and responsiveness with an AUC of 0.87 after five iterations. Hospital stays of greater than 5 days for THA were most associated with increased risk of illness and risk of comorbidity scores during admission compared to 1–5 days of stay. Conclusions. Our machine learning model derived from administrative big data demonstrated excellent validity, reliability, and responsiveness after primary THA while accurately predicting LOS and identifying two comorbidity scores as key value-based metrics. Predictive data has the potential to engender a risk-based PSPM prior to primary THA and other elective orthopaedic procedures