Advertisement for orthosearch.org.uk
Results 1 - 20 of 77
Results per page:
The Bone & Joint Journal
Vol. 103-B, Issue 9 | Pages 1442 - 1448
1 Sep 2021
McDonnell JM Evans SR McCarthy L Temperley H Waters C Ahern D Cunniffe G Morris S Synnott K Birch N Butler JS

In recent years, machine learning (ML) and artificial neural networks (ANNs), a particular subset of ML, have been adopted by various areas of healthcare. A number of diagnostic and prognostic algorithms have been designed and implemented across a range of orthopaedic sub-specialties to date, with many positive results. However, the methodology of many of these studies is flawed, and few compare the use of ML with the current approach in clinical practice. Spinal surgery has advanced rapidly over the past three decades, particularly in the areas of implant technology, advanced surgical techniques, biologics, and enhanced recovery protocols. It is therefore regarded an innovative field. Inevitably, spinal surgeons will wish to incorporate ML into their practice should models prove effective in diagnostic or prognostic terms. The purpose of this article is to review published studies that describe the application of neural networks to spinal surgery and which actively compare ANN models to contemporary clinical standards allowing evaluation of their efficacy, accuracy, and relatability. It also explores some of the limitations of the technology, which act to constrain the widespread adoption of neural networks for diagnostic and prognostic use in spinal care. Finally, it describes the necessary considerations should institutions wish to incorporate ANNs into their practices. In doing so, the aim of this review is to provide a practical approach for spinal surgeons to understand the relevant aspects of neural networks. Cite this article: Bone Joint J 2021;103-B(9):1442–1448


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_8 | Pages 79 - 79
1 Aug 2020
Bozzo A Ghert M Reilly J
Full Access

Advances in cancer therapy have prolonged patient survival even in the presence of disseminated disease and an increasing number of cancer patients are living with metastatic bone disease (MBD). The proximal femur is the most common long bone involved in MBD and pathologic fractures of the femur are associated with significant morbidity, mortality and loss of quality of life (QoL). Successful prophylactic surgery for an impending fracture of the proximal femur has been shown in multiple cohort studies to result in longer survival, preserved mobility, lower transfusion rates and shorter post-operative hospital stays. However, there is currently no optimal method to predict a pathologic fracture. The most well-known tool is Mirel's criteria, established in 1989 and is limited from guiding clinical practice due to poor specificity and sensitivity. The ideal clinical decision support tool will be of the highest sensitivity and specificity, non-invasive, generalizable to all patients, and not a burden on hospital resources or the patient's time. Our research uses novel machine learning techniques to develop a model to fill this considerable gap in the treatment pathway of MBD of the femur. The goal of our study is to train a convolutional neural network (CNN) to predict fracture risk when metastatic bone disease is present in the proximal femur. Our fracture risk prediction tool was developed by analysis of prospectively collected data of consecutive MBD patients presenting from 2009–2016. Patients with primary bone tumors, pathologic fractures at initial presentation, and hematologic malignancies were excluded. A total of 546 patients comprising 114 pathologic fractures were included. Every patient had at least one Anterior-Posterior X-ray and clinical data including patient demographics, Mirel's criteria, tumor biology, all previous radiation and chemotherapy received, multiple pain and function scores, medications and time to fracture or time to death. We have trained a convolutional neural network (CNN) with AP X-ray images of 546 patients with metastatic bone disease of the proximal femur. The digital X-ray data is converted into a matrix representing the color information at each pixel. Our CNN contains five convolutional layers, a fully connected layers of 512 units and a final output layer. As the information passes through successive levels of the network, higher level features are abstracted from the data. The model converges on two fully connected deep neural network layers that output the risk of fracture. This prediction is compared to the true outcome, and any errors are back-propagated through the network to accordingly adjust the weights between connections, until overall prediction accuracy is optimized. Methods to improve learning included using stochastic gradient descent with a learning rate of 0.01 and a momentum rate of 0.9. We used average classification accuracy and the average F1 score across five test sets to measure model performance. We compute F1 = 2 x (precision x recall)/(precision + recall). F1 is a measure of a model's accuracy in binary classification, in our case, whether a lesion would result in pathologic fracture or not. Our model achieved 88.2% accuracy in predicting fracture risk across five-fold cross validation testing. The F1 statistic is 0.87. This is the first reported application of convolutional neural networks, a machine learning algorithm, to this important Orthopaedic problem. Our neural network model was able to achieve reasonable accuracy in classifying fracture risk of metastatic proximal femur lesions from analysis of X-rays and clinical information. Our future work will aim to externally validate this algorithm on an international cohort


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_7 | Pages 96 - 96
1 Jul 2020
Bozzo A Ghert M
Full Access

Advances in cancer therapy have prolonged cancer patient survival even in the presence of disseminated disease and an increasing number of cancer patients are living with metastatic bone disease (MBD). The proximal femur is the most common long bone involved in MBD and pathologic fractures of the femur are associated with significant morbidity, mortality and loss of quality of life (QoL). Successful prophylactic surgery for an impending fracture of the proximal femur has been shown in multiple cohort studies to result in patients more likely to walk after surgery, longer survival, lower transfusion rates and shorter post-operative hospital stays. However, there is currently no optimal method to predict a pathologic fracture. The most well-known tool is Mirel's criteria, established in 1989 and is limited from guiding clinical practice due to poor specificity and sensitivity. The goal of our study is to train a convolutional neural network (CNN) to predict fracture risk when metastatic bone disease is present in the proximal femur. Our fracture risk prediction tool was developed by analysis of prospectively collected data for MBD patients (2009–2016) in order to determine which features are most commonly associated with fracture. Patients with primary bone tumors, pathologic fractures at initial presentation, and hematologic malignancies were excluded. A total of 1146 patients comprising 224 pathologic fractures were included. Every patient had at least one Anterior-Posterior X-ray. The clinical data includes patient demographics, tumor biology, all previous radiation and chemotherapy received, multiple pain and function scores, medications and time to fracture or time to death. Each of Mirel's criteria has been further subdivided and recorded for each lesion. We have trained a convolutional neural network (CNN) with X-ray images of 1146 patients with metastatic bone disease of the proximal femur. The digital X-ray data is converted into a matrix representing the color information at each pixel. Our CNN contains five convolutional layers, a fully connected layers of 512 units and a final output layer. As the information passes through successive levels of the network, higher level features are abstracted from the data. This model converges on two fully connected deep neural network layers that output the fracture risk. This prediction is compared to the true outcome, and any errors are back-propagated through the network to accordingly adjust the weights between connections. Methods to improve learning included using stochastic gradient descent with a learning rate of 0.01 and a momentum rate of 0.9. We used average classification accuracy and the average F1 score across test sets to measure model performance. We compute F1 = 2 x (precision x recall)/(precision + recall). F1 is a measure of a test's accuracy in binary classification, in our case, whether a lesion would result in pathologic fracture or not. Five-fold cross validation testing of our fully trained model revealed accurate classification for 88.2% of patients with metastatic bone disease of the proximal femur. The F1 statistic is 0.87. This represents a 24% error reduction from using Mirel's criteria alone to classify the risk of fracture in this cohort. This is the first reported application of convolutional neural networks, a machine learning algorithm, to an important Orthopaedic problem. Our neural network model was able to achieve impressive accuracy in classifying fracture risk of metastatic proximal femur lesions from analysis of X-rays and clinical information. Our future work will aim to validate this algorithm on an external cohort


Bone & Joint Open
Vol. 2, Issue 10 | Pages 879 - 885
20 Oct 2021
Oliveira e Carmo L van den Merkhof A Olczak J Gordon M Jutte PC Jaarsma RL IJpma FFA Doornberg JN Prijs J

Aims. The number of convolutional neural networks (CNN) available for fracture detection and classification is rapidly increasing. External validation of a CNN on a temporally separate (separated by time) or geographically separate (separated by location) dataset is crucial to assess generalizability of the CNN before application to clinical practice in other institutions. We aimed to answer the following questions: are current CNNs for fracture recognition externally valid?; which methods are applied for external validation (EV)?; and, what are reported performances of the EV sets compared to the internal validation (IV) sets of these CNNs?. Methods. The PubMed and Embase databases were systematically searched from January 2010 to October 2020 according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. The type of EV, characteristics of the external dataset, and diagnostic performance characteristics on the IV and EV datasets were collected and compared. Quality assessment was conducted using a seven-item checklist based on a modified Methodologic Index for NOn-Randomized Studies instrument (MINORS). Results. Out of 1,349 studies, 36 reported development of a CNN for fracture detection and/or classification. Of these, only four (11%) reported a form of EV. One study used temporal EV, one conducted both temporal and geographical EV, and two used geographical EV. When comparing the CNN’s performance on the IV set versus the EV set, the following were found: AUCs of 0.967 (IV) versus 0.975 (EV), 0.976 (IV) versus 0.985 to 0.992 (EV), 0.93 to 0.96 (IV) versus 0.80 to 0.89 (EV), and F1-scores of 0.856 to 0.863 (IV) versus 0.757 to 0.840 (EV). Conclusion. The number of externally validated CNNs in orthopaedic trauma for fracture recognition is still scarce. This greatly limits the potential for transfer of these CNNs from the developing institute to another hospital to achieve similar diagnostic performance. We recommend the use of geographical EV and statements such as the Consolidated Standards of Reporting Trials–Artificial Intelligence (CONSORT-AI), the Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence (SPIRIT-AI) and the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis–Machine Learning (TRIPOD-ML) to critically appraise performance of CNNs and improve methodological rigor, quality of future models, and facilitate eventual implementation in clinical practice. Cite this article: Bone Jt Open 2021;2(10):879–885


The Bone & Joint Journal
Vol. 103-B, Issue 8 | Pages 1358 - 1366
2 Aug 2021
Wei C Quan T Wang KY Gu A Fassihi SC Kahlenberg CA Malahias M Liu J Thakkar S Gonzalez Della Valle A Sculco PK

Aims. This study used an artificial neural network (ANN) model to determine the most important pre- and perioperative variables to predict same-day discharge in patients undergoing total knee arthroplasty (TKA). Methods. Data for this study were collected from the National Surgery Quality Improvement Program (NSQIP) database from the year 2018. Patients who received a primary, elective, unilateral TKA with a diagnosis of primary osteoarthritis were included. Demographic, preoperative, and intraoperative variables were analyzed. The ANN model was compared to a logistic regression model, which is a conventional machine-learning algorithm. Variables collected from 28,742 patients were analyzed based on their contribution to hospital length of stay. Results. The predictability of the ANN model, area under the curve (AUC) = 0.801, was similar to the logistic regression model (AUC = 0.796) and identified certain variables as important factors to predict same-day discharge. The ten most important factors favouring same-day discharge in the ANN model include preoperative sodium, preoperative international normalized ratio, BMI, age, anaesthesia type, operating time, dyspnoea status, functional status, race, anaemia status, and chronic obstructive pulmonary disease (COPD). Six of these variables were also found to be significant on logistic regression analysis. Conclusion. Both ANN modelling and logistic regression analysis revealed clinically important factors in predicting patients who can undergo safely undergo same-day discharge from an outpatient TKA. The ANN model provides a beneficial approach to help determine which perioperative factors can predict same-day discharge as of 2018 perioperative recovery protocols. Cite this article: Bone Joint J 2021;103-B(8):1358–1366


Orthopaedic Proceedings
Vol. 103-B, Issue SUPP_16 | Pages 76 - 76
1 Dec 2021
de Mello FL Kadirkamanathan V Wilkinson JM
Full Access

Abstract. Objectives. Conventional approaches (including Tobit) do not accurately account for ceiling effects in PROMs nor give uncertainty estimates. Here, a classifier neural network was used to estimate postoperative PROMs prior to surgery and compared with conventional methods. The Oxford Knee Score (OKS) and the Oxford Hip Score (OHS) were estimated with separate models. Methods. English NJR data from 2009 to 2018 was used, with 278.655 knee and 249.634 hip replacements. For both OKS and OHS estimations, the input variables included age, BMI, surgery date, sex, ASA, thromboprophylaxis, anaesthetic and preoperative PROMs responses. Bearing, fixation, head size and approach were also included for OHS and knee type for OKS estimation. A classifier neural network (NN) was compared with linear or Tobit regression, XGB and regression NN. The performance metrics were the root mean square error (RMSE), maximum absolute error (MAE) and area under curve (AUC). 95% confidence intervals were computed using 5-fold cross-validation. Results. The classifier NN and regression NN had the best RMSE, both with the same scores of 8.59±0.04 for knee and 7.88±0.04 for hip. The classifier NN had the best MAE, with 6.73±0.03 for knee and 5.73±0.03 for hip. The Tobit model was second, with 6.86±0.03 for knee and 6.00±0.01 for hip. The classifier NN had the best AUC, with (68.7±0.4)% for knee and (73.9±0.3)% for hip. The regression NN was second, with (67.1±0.3)% for knee and (71.1±0.4)% for hip. The Tobit model had the best AUC among conventional approaches, with (66.8±0.3)% for knee and (71.0±0.4)% for hip. Conclusions. The proposed model resulted in an improvement from the current state-of-the-art. Additionally, it estimates the full probability distribution of the postoperative PROMs, making it possible to know not only the estimated value but also its uncertainty


Orthopaedic Proceedings
Vol. 90-B, Issue SUPP_I | Pages 35 - 36
1 Mar 2008
Jaremko J Hill D Moreau M Zernicke R
Full Access

Recent studies have shown that scoliotic deformity can be estimated accurately from deformity of the full three hundred and sixty degrees torso shape. However, acquisition of these data requires an expensive multi-scanner system. If it was possible to estimate accurately scoliosis from the back surface shape alone, a single scanner and simplified analysis methods could be used. Here, we estimated the Cobb angle within ten degrees in 84% of forty-six patients from back surface data, compared to 99% within ten degrees for a previous, larger study using the entire torso shape. These results suggested that both back-surface and full-torso models for Cobb angle estimation should be pursued for their potential merits. The surface deformity of scoliosis, often the primary patient complaint, progresses non-linearly with the underlying spinal deformity. If it was possible to estimate reliably the degree of scoliosis from the surface, adolescent patients with non-progressing scoliosis could be spared harmful X-ray radiation. Some of us have previously estimated the scoliotic Cobb angle from three hundred and sixty degrees torso surface deformity. Here, we tested how accurately the Cobb angle could be estimated from back surface data alone, which are easier and less expensive to obtain than full-torso data. A genetic algorithm selected the clinical parameters to be used by a neural network to estimate scoliosis deformity from back surface deformity. We had forty-six consecutive patients with right-thoracic curves (Cobb angles eleven to ninety-seven degrees), in whom fifteen indices were available including age, height, bracing status, scoliometer reading, back surface rotation, and cosmetic score of landmark asymmetry. Those data were used by a neural network to estimate the Cobb angle within ten degrees in 84% of patients, a 30% improvement over regression-model accuracy, though less accurate than use of the three hundred and sixty degrees torso shape which estimated up to 99% of curves within ten degrees in a previous study. Neural network predictive accuracy was better when using the full three hundred and sixty degrees torso shape, but the simpler and more economical acquisition of back surface data alone also gave promising results. This pilot comparison study suggested that both models (using back surface data alone vs. using three hundred and sixty degrees torso data) should continue to be developed in attempts to optimize surface estimation of scoliosis


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_1 | Pages 133 - 133
1 Feb 2020
Borjali A Chen A Muratoglu O Varadarajan K
Full Access

INTRODUCTION. Mechanical loosening of total hip replacement (THR) is primarily diagnosed using radiographs, which are diagnostically challenging and require review by experienced radiologists and orthopaedic surgeons. Automated tools that assist less-experienced clinicians and mitigate human error can reduce the risk of missed or delayed diagnosis. Thus the purposes of this study were to: 1) develop an automated tool to detect mechanical loosening of THR by training a deep convolutional neural network (CNN) using THR x-rays, and 2) visualize the CNN training process to interpret how it functions. METHODS. A retrospective study was conducted using previously collected imaging data at a single institution with IRB approval. Twenty-three patients with cementless primary THR who underwent revision surgery due to mechanical loosening (either with a loose stem and/or a loose acetabular component) had their hip x-rays evaluated immediately prior to their revision surgery (32 “loose” x-rays). A comparison group was comprised of 23 patients who underwent primary cementless THR surgery with x-rays immediately after their primary surgery (31 “not loose” x-rays). Fig. 1 shows examples of “not loose” and “loose” THR x-ray. DenseNet201-CNN was utilized by swapping the top layer with a binary classifier using 90:10 split-validation [1]. Pre-trained CNN on ImageNet [2] and not pre-trained CNN (initial zero weights) were implemented to compare the results. Saliency maps were implemented to indicate the importance of each pixel of a given x-ray on the CNN's performance [3]. RESULTS. Fig. 2 shows the saliency maps for an example x-ray and the corresponding accuracy of the CNN on the entire validation dataset at different stages of the training for both pre-trained (Fig. 2a) and not pre-trained (Fig. 2b) CNNs. Colored regions in the saliency maps, where red denotes higher relative influence than blue, indicate the most influential regions on the CNN's performance. Pre-trained CNN achieved higher accuracy (87%) on the validation set x-rays than not pre-trained CNN (62%) after 10 epochs. The pre-trained CNN's saliency map at 10 epochs identified significant influence of bone-implant interaction regions on the CNN's performance. This indicates that the CNN is ‘looking’ at the clinically relevant features in the x-rays. The saliency maps also demonstrated that the pre-trained CNN quickly learned where to ‘look’, while the not pre-trained CNN struggles. DISCUSSION. An automated tool to detect mechanical loosening of THR was developed that can potentially assist clinicians with accurate diagnosis. By visualizing the influential regions of the x-ray on the CNN performance, this study shed light into CNN learning process and demonstrated that CNN is ‘looking’ at the clinically relevant features to classify the x-rays. This visualization is crucial to build trust in the automated system by interpreting how it functions to increase the confidence in the application of artificial intelligence to the field of orthopaedics. This study also demonstrated that pre-training CNN can accelerate the learning process and achieve high accuracy even on a small dataset. For any figures or tables, please contact the authors directly


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_1 | Pages 129 - 129
1 Feb 2020
Maag C Langhorn J Rullkoetter P
Full Access

INTRODUCTION. While computational models have been used for many years to contribute to pre-clinical, design phase iterations of total knee replacement implants, the analysis time required has limited the real-time use as required for other applications, such as in patient-specific surgical alignment in the operating room. In this environment, the impact of variation in ligament balance and implant alignment on estimated joint mechanics must be available instantaneously. As neural networks (NN) have shown the ability to appropriately represent dynamic systems, the objective of this preliminary study was to evaluate deep learning to represent the joint level kinetic and kinematic results from a validated finite element lower limb model with varied surgical alignment. METHODS. External hip and ankle boundary conditions were created for a previously-developed finite element lower limb model [1] for step down (SD), deep knee bend (DKB) and gait to best reproduce in-vivo loading conditions as measured on patients with the Innex knee (. orthoload.com. ) (Figure1). These boundary conditions were subsequently used as inputs for the model with a current fixed-bearing total knee replacement to estimate implant-specific kinetics and kinematics during activities of daily living. Implant alignments were varied, including variation of the hip-knee-ankle angle-±3°, the frontal plane joint line −7° to +5°, internal-external femoral rotation ±3°, and the tibial posterior slope 5° and 0°. Through varying these parameters a total of 2464 simulations were completed. A NN was created utilizing the NN toolbox in MATLAB. Sequence data inputs were produced from the alignment and the external boundary conditions for each activity cycle. Sequence outputs for the model were the 6 degree of freedom kinetics and kinematics, totaling 12 outputs. All data was normalized across the entire data set. Ten percent of the simulation runs were removed at random from the training set to be used for validation, leaving 2220 simulations for training and 244 for validation. A nine-layer bi-long short-term memory (LSTM) NN was created to take advantage of bi-LSTM layers ability to learn from past and future data. Training on the network was undertaken using an RMSprop solver until the root mean square error (RMSE) stopped reducing. Evaluation of NN quality was determined by the RMSE of the validation set. RESULTS. The trained NN was able to effectively estimate the validation data. Average RMSE over the kinetics of the validation data set was 140.7N/N∗m while the average RMSE over the kinematics of the validation data set was 4.47mm/deg (Figure 2,3–DKB, Gait shown). It is noted the error may be skewed by the larger magnitude kinetics and kinematics in the DKB activity as the average RMSE for just SD and gait was 85.9N/N∗m and 2.8mm/deg for the kinetics and kinematics, respectively. DISCUSSION. The accuracy of the generated NN indicates its potential for use in real-time modeling, and further work will explore additional changes in post-operative soft-tissue balance as well as scaling to patient-specific geometry


Bone & Joint Research
Vol. 12, Issue 7 | Pages 447 - 454
10 Jul 2023
Lisacek-Kiosoglous AB Powling AS Fontalis A Gabr A Mazomenos E Haddad FS

The use of artificial intelligence (AI) is rapidly growing across many domains, of which the medical field is no exception. AI is an umbrella term defining the practical application of algorithms to generate useful output, without the need of human cognition. Owing to the expanding volume of patient information collected, known as ‘big data’, AI is showing promise as a useful tool in healthcare research and across all aspects of patient care pathways. Practical applications in orthopaedic surgery include: diagnostics, such as fracture recognition and tumour detection; predictive models of clinical and patient-reported outcome measures, such as calculating mortality rates and length of hospital stay; and real-time rehabilitation monitoring and surgical training. However, clinicians should remain cognizant of AI’s limitations, as the development of robust reporting and validation frameworks is of paramount importance to prevent avoidable errors and biases. The aim of this review article is to provide a comprehensive understanding of AI and its subfields, as well as to delineate its existing clinical applications in trauma and orthopaedic surgery. Furthermore, this narrative review expands upon the limitations of AI and future direction.

Cite this article: Bone Joint Res 2023;12(7):447–454.


The Bone & Joint Journal
Vol. 104-B, Issue 8 | Pages 911 - 914
1 Aug 2022
Prijs J Liao Z Ashkani-Esfahani S Olczak J Gordon M Jayakumar P Jutte PC Jaarsma RL IJpma FFA Doornberg JN

Artificial intelligence (AI) is, in essence, the concept of ‘computer thinking’, encompassing methods that train computers to perform and learn from executing certain tasks, called machine learning, and methods to build intricate computer models that both learn and adapt, called complex neural networks. Computer vision is a function of AI by which machine learning and complex neural networks can be applied to enable computers to capture, analyze, and interpret information from clinical images and visual inputs. This annotation summarizes key considerations and future perspectives concerning computer vision, questioning the need for this technology (the ‘why’), the current applications (the ‘what’), and the approach to unlocking its full potential (the ‘how’). Cite this article: Bone Joint J 2022;104-B(8):911–914


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_12 | Pages 85 - 85
23 Jun 2023
de Mello F Kadirkamanathan V Wilkinson JM
Full Access

Successful estimation of postoperative PROMs prior to a joint replacement surgery is important in deciding the best treatment option for a patient. However, estimation of the outcome is associated with substantial noise around individual prediction. Here, we test whether a classifier neural network can be used to simultaneously estimate postoperative PROMs and uncertainty better than current methods. We perform Oxford hip score (OHS) estimation using data collected by the NJR from 249,634 hip replacement surgeries performed from 2009 to 2018. The root mean square error (RMSE) of the various methods are compared to the standard deviation of outcome change distribution to measure the proportion of the total outcome variability that the model can capture. The area under the curve (AUC) for the probability of the change score being above a certain threshold was also plotted. The proposed classifier NN had a better or equivalent RMSE than all other currently used models. The threshold AUC shows similar results for all methods close to a change score of 20 but demonstrates better accuracy of the classifier neural network close to 0 change and greater than 30 change, showing that the full probability distribution performed by the classifier neural network resulted in a significant improvement in estimating the upper and lower quantiles of the change score probability distribution. Consequently, probabilistic estimation as performed by the classifier NN is the most adequate approach to this problem, since the final score has an important component of uncertainty. This study shows the importance of uncertainty estimation to accompany postoperative PROMs prediction and presents a clinically-meaningful method for personalised outcome that includes such uncertainty estimation


Orthopaedic Proceedings
Vol. 104-B, Issue SUPP_4 | Pages 5 - 5
1 Apr 2022
de Mello F Kadirkamanathan V Wilkinson M
Full Access

Successful estimation of postoperative PROMs prior to a joint replacement surgery is important in deciding the best treatment option for a patient. However, estimation of the outcome is associated with substantial noise around individual prediction. Here, we test whether a classifier neural network can be used to simultaneously estimate postoperative PROMs and uncertainty better than current methods. We perform Oxford hip score (OHS) estimation using data collected by the NJR from 249,634 hip replacement surgeries performed from 2009 to 2018. The root mean square error (RMSE) of the various methods are compared to the standard deviation of outcome change distribution to measure the proportion of the total outcome variability that the model can capture. The area under the curve (AUC) for the probability of the change score being above a certain threshold was also plotted. The proposed classifier NN had a better or equivalent RMSE than all other currently used models. The standard deviation for the change score for the entire population was 9.93, which can be interpreted as the RMSE that would be achieved for a model that gives the same estimation for all patients regardless of the covariates. However, most of the variation in the postoperative OHS/OKS change score is not captured by the models, confirming the importance of accurate uncertainty estimation. The threshold AUC shows similar results for all methods close to a change score of 20 but demonstrates better accuracy of the classifier neural network close to 0 change and greater than 30 change, showing that the full probability distribution performed by the classifier neural network resulted in a significant improvement in estimating the upper and lower quantiles of the change score probability distribution. Consequently, probabilistic estimation as performed by the classifier NN is the most adequate approach to this problem, since the final score has an important component of uncertainty. This study shows the importance of uncertainty estimation to accompany postoperative PROMs prediction and presents a clinically-meaningful method for personalised outcome that includes such uncertainty estimation


The Bone & Joint Journal
Vol. 102-B, Issue 6 Supple A | Pages 101 - 106
1 Jun 2020
Shah RF Bini SA Martinez AM Pedoia V Vail TP

Aims. The aim of this study was to evaluate the ability of a machine-learning algorithm to diagnose prosthetic loosening from preoperative radiographs and to investigate the inputs that might improve its performance. Methods. A group of 697 patients underwent a first-time revision of a total hip (THA) or total knee arthroplasty (TKA) at our institution between 2012 and 2018. Preoperative anteroposterior (AP) and lateral radiographs, and historical and comorbidity information were collected from their electronic records. Each patient was defined as having loose or fixed components based on the operation notes. We trained a series of convolutional neural network (CNN) models to predict a diagnosis of loosening at the time of surgery from the preoperative radiographs. We then added historical data about the patients to the best performing model to create a final model and tested it on an independent dataset. Results. The convolutional neural network we built performed well when detecting loosening from radiographs alone. The first model built de novo with only the radiological image as input had an accuracy of 70%. The final model, which was built by fine-tuning a publicly available model named DenseNet, combining the AP and lateral radiographs, and incorporating information from the patient’s history, had an accuracy, sensitivity, and specificity of 88.3%, 70.2%, and 95.6% on the independent test dataset. It performed better for cases of revision THA with an accuracy of 90.1%, than for cases of revision TKA with an accuracy of 85.8%. Conclusion. This study showed that machine learning can detect prosthetic loosening from radiographs. Its accuracy is enhanced when using highly trained public algorithms, and when adding clinical data to the algorithm. While this algorithm may not be sufficient in its present state of development as a standalone metric of loosening, it is currently a useful augment for clinical decision making. Cite this article: Bone Joint J 2020;102-B(6 Supple A):101–106


Orthopaedic Proceedings
Vol. 104-B, Issue SUPP_12 | Pages 90 - 90
1 Dec 2022
Abbas A Toor J Du JT Versteeg A Yee N Finkelstein J Abouali J Nousiainen M Kreder H Hall J Whyne C Larouche J
Full Access

Excessive resident duty hours (RDH) are a recognized issue with implications for physician well-being and patient safety. A major component of the RDH concern is on-call duty. While considerable work has been done to reduce resident call workload, there is a paucity of research in optimizing resident call scheduling. Call coverage is scheduled manually rather than demand-based, which generally leads to over-scheduling to prevent a service gap. Machine learning (ML) has been widely applied in other industries to prevent such issues of a supply-demand mismatch. However, the healthcare field has been slow to adopt these innovations. As such, the aim of this study was to use ML models to 1) predict demand on orthopaedic surgery residents at a level I trauma centre and 2) identify variables key to demand prediction. Daily surgical handover emails over an eight year (2012-2019) period at a level I trauma centre were collected. The following data was used to calculate demand: spine call coverage, date, and number of operating rooms (ORs), traumas, admissions and consults completed. Various ML models (linear, tree-based and neural networks) were trained to predict the workload, with their results compared to the current scheduling approach. Quality of models was determined by using the area under the receiver operator curve (AUC) and accuracy of the predictions. The top ten most important variables were extracted from the most successful model. During training, the model with the highest AUC and accuracy was the multivariate adaptive regression splines (MARS) model, with an AUC of 0.78±0.03 and accuracy of 71.7%±3.1%. During testing, the model with the highest AUC and accuracy was the neural network model, with an AUC of 0.81 and accuracy of 73.7%. All models were better than the current approach, which had an AUC of 0.50 and accuracy of 50.1%. Key variables used by the neural network model were (descending order): spine call duty, year, weekday/weekend, month, and day of the week. This was the first study attempting to use ML to predict the service demand on orthopaedic surgery residents at a major level I trauma centre. Multiple ML models were shown to be more appropriate and accurate at predicting the demand on surgical residents as compared to the current scheduling approach. Future work should look to incorporate predictive models with optimization strategies to match scheduling with demand in order to improve resident well being and patient care


Bone & Joint Research
Vol. 13, Issue 10 | Pages 588 - 595
17 Oct 2024
Breu R Avelar C Bertalan Z Grillari J Redl H Ljuhar R Quadlbauer S Hausner T

Aims. The aim of this study was to create artificial intelligence (AI) software with the purpose of providing a second opinion to physicians to support distal radius fracture (DRF) detection, and to compare the accuracy of fracture detection of physicians with and without software support. Methods. The dataset consisted of 26,121 anonymized anterior-posterior (AP) and lateral standard view radiographs of the wrist, with and without DRF. The convolutional neural network (CNN) model was trained to detect the presence of a DRF by comparing the radiographs containing a fracture to the inconspicuous ones. A total of 11 physicians (six surgeons in training and five hand surgeons) assessed 200 pairs of randomly selected digital radiographs of the wrist (AP and lateral) for the presence of a DRF. The same images were first evaluated without, and then with, the support of the CNN model, and the diagnostic accuracy of the two methods was compared. Results. At the time of the study, the CNN model showed an area under the receiver operating curve of 0.97. AI assistance improved the physician’s sensitivity (correct fracture detection) from 80% to 87%, and the specificity (correct fracture exclusion) from 91% to 95%. The overall error rate (combined false positive and false negative) was reduced from 14% without AI to 9% with AI. Conclusion. The use of a CNN model as a second opinion can improve the diagnostic accuracy of DRF detection in the study setting. Cite this article: Bone Joint Res 2024;13(10):588–595


Bone & Joint Open
Vol. 5, Issue 8 | Pages 671 - 680
14 Aug 2024
Fontalis A Zhao B Putzeys P Mancino F Zhang S Vanspauwen T Glod F Plastow R Mazomenos E Haddad FS

Aims. Precise implant positioning, tailored to individual spinopelvic biomechanics and phenotype, is paramount for stability in total hip arthroplasty (THA). Despite a few studies on instability prediction, there is a notable gap in research utilizing artificial intelligence (AI). The objective of our pilot study was to evaluate the feasibility of developing an AI algorithm tailored to individual spinopelvic mechanics and patient phenotype for predicting impingement. Methods. This international, multicentre prospective cohort study across two centres encompassed 157 adults undergoing primary robotic arm-assisted THA. Impingement during specific flexion and extension stances was identified using the virtual range of motion (ROM) tool of the robotic software. The primary AI model, the Light Gradient-Boosting Machine (LGBM), used tabular data to predict impingement presence, direction (flexion or extension), and type. A secondary model integrating tabular data with plain anteroposterior pelvis radiographs was evaluated to assess for any potential enhancement in prediction accuracy. Results. We identified nine predictors from an analysis of baseline spinopelvic characteristics and surgical planning parameters. Using fivefold cross-validation, the LGBM achieved 70.2% impingement prediction accuracy. With impingement data, the LGBM estimated direction with 85% accuracy, while the support vector machine (SVM) determined impingement type with 72.9% accuracy. After integrating imaging data with a multilayer perceptron (tabular) and a convolutional neural network (radiograph), the LGBM’s prediction was 68.1%. Both combined and LGBM-only had similar impingement direction prediction rates (around 84.5%). Conclusion. This study is a pioneering effort in leveraging AI for impingement prediction in THA, utilizing a comprehensive, real-world clinical dataset. Our machine-learning algorithm demonstrated promising accuracy in predicting impingement, its type, and direction. While the addition of imaging data to our deep-learning algorithm did not boost accuracy, the potential for refined annotations, such as landmark markings, offers avenues for future enhancement. Prior to clinical integration, external validation and larger-scale testing of this algorithm are essential. Cite this article: Bone Jt Open 2024;5(8):671–680


Bone & Joint Research
Vol. 12, Issue 9 | Pages 512 - 521
1 Sep 2023
Langenberger B Schrednitzki D Halder AM Busse R Pross CM

Aims. A substantial fraction of patients undergoing knee arthroplasty (KA) or hip arthroplasty (HA) do not achieve an improvement as high as the minimal clinically important difference (MCID), i.e. do not achieve a meaningful improvement. Using three patient-reported outcome measures (PROMs), our aim was: 1) to assess machine learning (ML), the simple pre-surgery PROM score, and logistic-regression (LR)-derived performance in their prediction of whether patients undergoing HA or KA achieve an improvement as high or higher than a calculated MCID; and 2) to test whether ML is able to outperform LR or pre-surgery PROM scores in predictive performance. Methods. MCIDs were derived using the change difference method in a sample of 1,843 HA and 1,546 KA patients. An artificial neural network, a gradient boosting machine, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic net, random forest, LR, and pre-surgery PROM scores were applied to predict MCID for the following PROMs: EuroQol five-dimension, five-level questionnaire (EQ-5D-5L), EQ visual analogue scale (EQ-VAS), Hip disability and Osteoarthritis Outcome Score-Physical Function Short-form (HOOS-PS), and Knee injury and Osteoarthritis Outcome Score-Physical Function Short-form (KOOS-PS). Results. Predictive performance of the best models per outcome ranged from 0.71 for HOOS-PS to 0.84 for EQ-VAS (HA sample). ML statistically significantly outperformed LR and pre-surgery PROM scores in two out of six cases. Conclusion. MCIDs can be predicted with reasonable performance. ML was able to outperform traditional methods, although only in a minority of cases. Cite this article: Bone Joint Res 2023;12(9):512–521


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_2 | Pages 5 - 5
1 Feb 2020
Burton W Myers C Rullkoetter P
Full Access

Introduction. Gait laboratory measurement of whole-body kinematics and ground reaction forces during a wide range of activities is frequently performed in joint replacement patient diagnosis, monitoring, and rehabilitation programs. These data are commonly processed in musculoskeletal modeling platforms such as OpenSim and Anybody to estimate muscle and joint reaction forces during activity. However, the processing required to obtain musculoskeletal estimates can be time consuming, requires significant expertise, and thus seriously limits the patient populations studied. Accordingly, the purpose of this study was to evaluate the potential of deep learning methods for estimating muscle and joint reaction forces over time given kinematic data, height, weight, and ground reaction forces for total knee replacement (TKR) patients performing activities of daily living (ADLs). Methods. 70 TKR patients were fitted with 32 reflective markers used to define anatomical landmarks for 3D motion capture. Patients were instructed to perform a range of tasks including gait, step-down and sit-to-stand. Gait was performed at a self-selected pace, step down from an 8” step height, and sit-to-stand using a chair height of 17”. Tasks were performed over a force platform while force data was collected at 2000 Hz and a 14 camera motion capture system collected at 100 Hz. The resulting data was processed in OpenSim to estimate joint reaction and muscle forces in the hip and knee using static optimization. The full set of data consisted of 135 instances from 70 patients with 63 sit-to-stands, 15 right-sided step downs, 14 left-sided step downs, and 43 gait sequences. Two classes of neural networks (NNs), a recurrent neural network (RNN) and temporal convolutional neural network (TCN), were trained to predict activity classification from joint angle, ground reaction force, and anthropometrics. The NNs were trained to predict muscle and joint reaction forces over time from the same input metrics. The 135 instances were split into 100 instances for training, 15 for validation, and 20 for testing. Results. The RNN and TCN yielded classification accuracies of 90% and 100% on the test set. Correlation coefficients between ground truth and predictions from the test set ranged from 0.81–0.95 for the RNN, depending on the activity. Predictions from both NNs were qualitatively assessed. Both NNs were able to effectively learn relationships between the input and output variables. Discussion. The objective of the study was to develop and evaluate deep learning methods for predicting patient mechanics from standard gait lab data. The resulting models classified activities with excellent performance, and showed promise for predicting exact values for loading metrics for a range of different activities. These results indicate potential for real-time prediction of musculoskeletal metrics with application in patient diagnostics and rehabilitation. For any figures or tables, please contact authors directly


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_7 | Pages 71 - 71
4 Apr 2023
Arrowsmith C Burns D Mak T Hardisty M Whyne C
Full Access

Access to health care, including physiotherapy, is increasingly occurring through virtual formats. At-home adherence to physical therapy programs is often poor and few tools exist to objectively measure low back physiotherapy exercise participation without the direct supervision of a medical professional. The aim of this study was to develop and evaluate the potential for performing automatic, unsupervised video-based monitoring of at-home low back physiotherapy exercises using a single mobile phone camera. 24 healthy adult subjects performed seven exercises based on the McKenzie low back physiotherapy program while being filmed with two smartphone cameras. Joint locations were automatically extracted using an open-source pose estimation framework. Engineered features were extracted from the joint location time series and used to train a support vector machine classifier (SVC). A convolutional neural network (CNN) was trained directly on the joint location time series data to classify exercises based on a recording from a single camera. The models were evaluated using a 5-fold cross validation approach, stratified by subject, with the class-balanced accuracy used as the performance metric. Optimal performance was achieved when using a total of 12 pose estimation landmarks from the upper and lower body, with the SVC model achieving a classification accuracy of 96±4% and the CNN model an accuracy of 97±2%. This study demonstrates the feasibility of using a smartphone camera and a supervised machine learning model to effectively assess at-home low back physiotherapy adherence. This approach could provide a low-cost, scalable method for tracking adherence to physical therapy exercise programs in a variety of settings