Advertisement for orthosearch.org.uk
Results 1 - 20 of 59
Results per page:
Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_8 | Pages 79 - 79
1 Aug 2020
Bozzo A Ghert M Reilly J
Full Access

Advances in cancer therapy have prolonged patient survival even in the presence of disseminated disease and an increasing number of cancer patients are living with metastatic bone disease (MBD). The proximal femur is the most common long bone involved in MBD and pathologic fractures of the femur are associated with significant morbidity, mortality and loss of quality of life (QoL). Successful prophylactic surgery for an impending fracture of the proximal femur has been shown in multiple cohort studies to result in longer survival, preserved mobility, lower transfusion rates and shorter post-operative hospital stays. However, there is currently no optimal method to predict a pathologic fracture. The most well-known tool is Mirel's criteria, established in 1989 and is limited from guiding clinical practice due to poor specificity and sensitivity. The ideal clinical decision support tool will be of the highest sensitivity and specificity, non-invasive, generalizable to all patients, and not a burden on hospital resources or the patient's time. Our research uses novel machine learning techniques to develop a model to fill this considerable gap in the treatment pathway of MBD of the femur. The goal of our study is to train a convolutional neural network (CNN) to predict fracture risk when metastatic bone disease is present in the proximal femur. Our fracture risk prediction tool was developed by analysis of prospectively collected data of consecutive MBD patients presenting from 2009–2016. Patients with primary bone tumors, pathologic fractures at initial presentation, and hematologic malignancies were excluded. A total of 546 patients comprising 114 pathologic fractures were included. Every patient had at least one Anterior-Posterior X-ray and clinical data including patient demographics, Mirel's criteria, tumor biology, all previous radiation and chemotherapy received, multiple pain and function scores, medications and time to fracture or time to death. We have trained a convolutional neural network (CNN) with AP X-ray images of 546 patients with metastatic bone disease of the proximal femur. The digital X-ray data is converted into a matrix representing the color information at each pixel. Our CNN contains five convolutional layers, a fully connected layers of 512 units and a final output layer. As the information passes through successive levels of the network, higher level features are abstracted from the data. The model converges on two fully connected deep neural network layers that output the risk of fracture. This prediction is compared to the true outcome, and any errors are back-propagated through the network to accordingly adjust the weights between connections, until overall prediction accuracy is optimized. Methods to improve learning included using stochastic gradient descent with a learning rate of 0.01 and a momentum rate of 0.9. We used average classification accuracy and the average F1 score across five test sets to measure model performance. We compute F1 = 2 x (precision x recall)/(precision + recall). F1 is a measure of a model's accuracy in binary classification, in our case, whether a lesion would result in pathologic fracture or not. Our model achieved 88.2% accuracy in predicting fracture risk across five-fold cross validation testing. The F1 statistic is 0.87. This is the first reported application of convolutional neural networks, a machine learning algorithm, to this important Orthopaedic problem. Our neural network model was able to achieve reasonable accuracy in classifying fracture risk of metastatic proximal femur lesions from analysis of X-rays and clinical information. Our future work will aim to externally validate this algorithm on an international cohort


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_7 | Pages 96 - 96
1 Jul 2020
Bozzo A Ghert M
Full Access

Advances in cancer therapy have prolonged cancer patient survival even in the presence of disseminated disease and an increasing number of cancer patients are living with metastatic bone disease (MBD). The proximal femur is the most common long bone involved in MBD and pathologic fractures of the femur are associated with significant morbidity, mortality and loss of quality of life (QoL). Successful prophylactic surgery for an impending fracture of the proximal femur has been shown in multiple cohort studies to result in patients more likely to walk after surgery, longer survival, lower transfusion rates and shorter post-operative hospital stays. However, there is currently no optimal method to predict a pathologic fracture. The most well-known tool is Mirel's criteria, established in 1989 and is limited from guiding clinical practice due to poor specificity and sensitivity. The goal of our study is to train a convolutional neural network (CNN) to predict fracture risk when metastatic bone disease is present in the proximal femur. Our fracture risk prediction tool was developed by analysis of prospectively collected data for MBD patients (2009–2016) in order to determine which features are most commonly associated with fracture. Patients with primary bone tumors, pathologic fractures at initial presentation, and hematologic malignancies were excluded. A total of 1146 patients comprising 224 pathologic fractures were included. Every patient had at least one Anterior-Posterior X-ray. The clinical data includes patient demographics, tumor biology, all previous radiation and chemotherapy received, multiple pain and function scores, medications and time to fracture or time to death. Each of Mirel's criteria has been further subdivided and recorded for each lesion. We have trained a convolutional neural network (CNN) with X-ray images of 1146 patients with metastatic bone disease of the proximal femur. The digital X-ray data is converted into a matrix representing the color information at each pixel. Our CNN contains five convolutional layers, a fully connected layers of 512 units and a final output layer. As the information passes through successive levels of the network, higher level features are abstracted from the data. This model converges on two fully connected deep neural network layers that output the fracture risk. This prediction is compared to the true outcome, and any errors are back-propagated through the network to accordingly adjust the weights between connections. Methods to improve learning included using stochastic gradient descent with a learning rate of 0.01 and a momentum rate of 0.9. We used average classification accuracy and the average F1 score across test sets to measure model performance. We compute F1 = 2 x (precision x recall)/(precision + recall). F1 is a measure of a test's accuracy in binary classification, in our case, whether a lesion would result in pathologic fracture or not. Five-fold cross validation testing of our fully trained model revealed accurate classification for 88.2% of patients with metastatic bone disease of the proximal femur. The F1 statistic is 0.87. This represents a 24% error reduction from using Mirel's criteria alone to classify the risk of fracture in this cohort. This is the first reported application of convolutional neural networks, a machine learning algorithm, to an important Orthopaedic problem. Our neural network model was able to achieve impressive accuracy in classifying fracture risk of metastatic proximal femur lesions from analysis of X-rays and clinical information. Our future work will aim to validate this algorithm on an external cohort


Bone & Joint Open
Vol. 2, Issue 10 | Pages 879 - 885
20 Oct 2021
Oliveira e Carmo L van den Merkhof A Olczak J Gordon M Jutte PC Jaarsma RL IJpma FFA Doornberg JN Prijs J

Aims. The number of convolutional neural networks (CNN) available for fracture detection and classification is rapidly increasing. External validation of a CNN on a temporally separate (separated by time) or geographically separate (separated by location) dataset is crucial to assess generalizability of the CNN before application to clinical practice in other institutions. We aimed to answer the following questions: are current CNNs for fracture recognition externally valid?; which methods are applied for external validation (EV)?; and, what are reported performances of the EV sets compared to the internal validation (IV) sets of these CNNs?. Methods. The PubMed and Embase databases were systematically searched from January 2010 to October 2020 according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. The type of EV, characteristics of the external dataset, and diagnostic performance characteristics on the IV and EV datasets were collected and compared. Quality assessment was conducted using a seven-item checklist based on a modified Methodologic Index for NOn-Randomized Studies instrument (MINORS). Results. Out of 1,349 studies, 36 reported development of a CNN for fracture detection and/or classification. Of these, only four (11%) reported a form of EV. One study used temporal EV, one conducted both temporal and geographical EV, and two used geographical EV. When comparing the CNN’s performance on the IV set versus the EV set, the following were found: AUCs of 0.967 (IV) versus 0.975 (EV), 0.976 (IV) versus 0.985 to 0.992 (EV), 0.93 to 0.96 (IV) versus 0.80 to 0.89 (EV), and F1-scores of 0.856 to 0.863 (IV) versus 0.757 to 0.840 (EV). Conclusion. The number of externally validated CNNs in orthopaedic trauma for fracture recognition is still scarce. This greatly limits the potential for transfer of these CNNs from the developing institute to another hospital to achieve similar diagnostic performance. We recommend the use of geographical EV and statements such as the Consolidated Standards of Reporting Trials–Artificial Intelligence (CONSORT-AI), the Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence (SPIRIT-AI) and the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis–Machine Learning (TRIPOD-ML) to critically appraise performance of CNNs and improve methodological rigor, quality of future models, and facilitate eventual implementation in clinical practice. Cite this article: Bone Jt Open 2021;2(10):879–885


Orthopaedic Proceedings
Vol. 90-B, Issue SUPP_I | Pages 35 - 36
1 Mar 2008
Jaremko J Hill D Moreau M Zernicke R
Full Access

Recent studies have shown that scoliotic deformity can be estimated accurately from deformity of the full three hundred and sixty degrees torso shape. However, acquisition of these data requires an expensive multi-scanner system. If it was possible to estimate accurately scoliosis from the back surface shape alone, a single scanner and simplified analysis methods could be used. Here, we estimated the Cobb angle within ten degrees in 84% of forty-six patients from back surface data, compared to 99% within ten degrees for a previous, larger study using the entire torso shape. These results suggested that both back-surface and full-torso models for Cobb angle estimation should be pursued for their potential merits. The surface deformity of scoliosis, often the primary patient complaint, progresses non-linearly with the underlying spinal deformity. If it was possible to estimate reliably the degree of scoliosis from the surface, adolescent patients with non-progressing scoliosis could be spared harmful X-ray radiation. Some of us have previously estimated the scoliotic Cobb angle from three hundred and sixty degrees torso surface deformity. Here, we tested how accurately the Cobb angle could be estimated from back surface data alone, which are easier and less expensive to obtain than full-torso data. A genetic algorithm selected the clinical parameters to be used by a neural network to estimate scoliosis deformity from back surface deformity. We had forty-six consecutive patients with right-thoracic curves (Cobb angles eleven to ninety-seven degrees), in whom fifteen indices were available including age, height, bracing status, scoliometer reading, back surface rotation, and cosmetic score of landmark asymmetry. Those data were used by a neural network to estimate the Cobb angle within ten degrees in 84% of patients, a 30% improvement over regression-model accuracy, though less accurate than use of the three hundred and sixty degrees torso shape which estimated up to 99% of curves within ten degrees in a previous study. Neural network predictive accuracy was better when using the full three hundred and sixty degrees torso shape, but the simpler and more economical acquisition of back surface data alone also gave promising results. This pilot comparison study suggested that both models (using back surface data alone vs. using three hundred and sixty degrees torso data) should continue to be developed in attempts to optimize surface estimation of scoliosis


Orthopaedic Proceedings
Vol. 103-B, Issue SUPP_16 | Pages 76 - 76
1 Dec 2021
de Mello FL Kadirkamanathan V Wilkinson JM
Full Access

Abstract. Objectives. Conventional approaches (including Tobit) do not accurately account for ceiling effects in PROMs nor give uncertainty estimates. Here, a classifier neural network was used to estimate postoperative PROMs prior to surgery and compared with conventional methods. The Oxford Knee Score (OKS) and the Oxford Hip Score (OHS) were estimated with separate models. Methods. English NJR data from 2009 to 2018 was used, with 278.655 knee and 249.634 hip replacements. For both OKS and OHS estimations, the input variables included age, BMI, surgery date, sex, ASA, thromboprophylaxis, anaesthetic and preoperative PROMs responses. Bearing, fixation, head size and approach were also included for OHS and knee type for OKS estimation. A classifier neural network (NN) was compared with linear or Tobit regression, XGB and regression NN. The performance metrics were the root mean square error (RMSE), maximum absolute error (MAE) and area under curve (AUC). 95% confidence intervals were computed using 5-fold cross-validation. Results. The classifier NN and regression NN had the best RMSE, both with the same scores of 8.59±0.04 for knee and 7.88±0.04 for hip. The classifier NN had the best MAE, with 6.73±0.03 for knee and 5.73±0.03 for hip. The Tobit model was second, with 6.86±0.03 for knee and 6.00±0.01 for hip. The classifier NN had the best AUC, with (68.7±0.4)% for knee and (73.9±0.3)% for hip. The regression NN was second, with (67.1±0.3)% for knee and (71.1±0.4)% for hip. The Tobit model had the best AUC among conventional approaches, with (66.8±0.3)% for knee and (71.0±0.4)% for hip. Conclusions. The proposed model resulted in an improvement from the current state-of-the-art. Additionally, it estimates the full probability distribution of the postoperative PROMs, making it possible to know not only the estimated value but also its uncertainty


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_1 | Pages 133 - 133
1 Feb 2020
Borjali A Chen A Muratoglu O Varadarajan K
Full Access

INTRODUCTION. Mechanical loosening of total hip replacement (THR) is primarily diagnosed using radiographs, which are diagnostically challenging and require review by experienced radiologists and orthopaedic surgeons. Automated tools that assist less-experienced clinicians and mitigate human error can reduce the risk of missed or delayed diagnosis. Thus the purposes of this study were to: 1) develop an automated tool to detect mechanical loosening of THR by training a deep convolutional neural network (CNN) using THR x-rays, and 2) visualize the CNN training process to interpret how it functions. METHODS. A retrospective study was conducted using previously collected imaging data at a single institution with IRB approval. Twenty-three patients with cementless primary THR who underwent revision surgery due to mechanical loosening (either with a loose stem and/or a loose acetabular component) had their hip x-rays evaluated immediately prior to their revision surgery (32 “loose” x-rays). A comparison group was comprised of 23 patients who underwent primary cementless THR surgery with x-rays immediately after their primary surgery (31 “not loose” x-rays). Fig. 1 shows examples of “not loose” and “loose” THR x-ray. DenseNet201-CNN was utilized by swapping the top layer with a binary classifier using 90:10 split-validation [1]. Pre-trained CNN on ImageNet [2] and not pre-trained CNN (initial zero weights) were implemented to compare the results. Saliency maps were implemented to indicate the importance of each pixel of a given x-ray on the CNN's performance [3]. RESULTS. Fig. 2 shows the saliency maps for an example x-ray and the corresponding accuracy of the CNN on the entire validation dataset at different stages of the training for both pre-trained (Fig. 2a) and not pre-trained (Fig. 2b) CNNs. Colored regions in the saliency maps, where red denotes higher relative influence than blue, indicate the most influential regions on the CNN's performance. Pre-trained CNN achieved higher accuracy (87%) on the validation set x-rays than not pre-trained CNN (62%) after 10 epochs. The pre-trained CNN's saliency map at 10 epochs identified significant influence of bone-implant interaction regions on the CNN's performance. This indicates that the CNN is ‘looking’ at the clinically relevant features in the x-rays. The saliency maps also demonstrated that the pre-trained CNN quickly learned where to ‘look’, while the not pre-trained CNN struggles. DISCUSSION. An automated tool to detect mechanical loosening of THR was developed that can potentially assist clinicians with accurate diagnosis. By visualizing the influential regions of the x-ray on the CNN performance, this study shed light into CNN learning process and demonstrated that CNN is ‘looking’ at the clinically relevant features to classify the x-rays. This visualization is crucial to build trust in the automated system by interpreting how it functions to increase the confidence in the application of artificial intelligence to the field of orthopaedics. This study also demonstrated that pre-training CNN can accelerate the learning process and achieve high accuracy even on a small dataset. For any figures or tables, please contact the authors directly


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_1 | Pages 129 - 129
1 Feb 2020
Maag C Langhorn J Rullkoetter P
Full Access

INTRODUCTION. While computational models have been used for many years to contribute to pre-clinical, design phase iterations of total knee replacement implants, the analysis time required has limited the real-time use as required for other applications, such as in patient-specific surgical alignment in the operating room. In this environment, the impact of variation in ligament balance and implant alignment on estimated joint mechanics must be available instantaneously. As neural networks (NN) have shown the ability to appropriately represent dynamic systems, the objective of this preliminary study was to evaluate deep learning to represent the joint level kinetic and kinematic results from a validated finite element lower limb model with varied surgical alignment. METHODS. External hip and ankle boundary conditions were created for a previously-developed finite element lower limb model [1] for step down (SD), deep knee bend (DKB) and gait to best reproduce in-vivo loading conditions as measured on patients with the Innex knee (. orthoload.com. ) (Figure1). These boundary conditions were subsequently used as inputs for the model with a current fixed-bearing total knee replacement to estimate implant-specific kinetics and kinematics during activities of daily living. Implant alignments were varied, including variation of the hip-knee-ankle angle-±3°, the frontal plane joint line −7° to +5°, internal-external femoral rotation ±3°, and the tibial posterior slope 5° and 0°. Through varying these parameters a total of 2464 simulations were completed. A NN was created utilizing the NN toolbox in MATLAB. Sequence data inputs were produced from the alignment and the external boundary conditions for each activity cycle. Sequence outputs for the model were the 6 degree of freedom kinetics and kinematics, totaling 12 outputs. All data was normalized across the entire data set. Ten percent of the simulation runs were removed at random from the training set to be used for validation, leaving 2220 simulations for training and 244 for validation. A nine-layer bi-long short-term memory (LSTM) NN was created to take advantage of bi-LSTM layers ability to learn from past and future data. Training on the network was undertaken using an RMSprop solver until the root mean square error (RMSE) stopped reducing. Evaluation of NN quality was determined by the RMSE of the validation set. RESULTS. The trained NN was able to effectively estimate the validation data. Average RMSE over the kinetics of the validation data set was 140.7N/N∗m while the average RMSE over the kinematics of the validation data set was 4.47mm/deg (Figure 2,3–DKB, Gait shown). It is noted the error may be skewed by the larger magnitude kinetics and kinematics in the DKB activity as the average RMSE for just SD and gait was 85.9N/N∗m and 2.8mm/deg for the kinetics and kinematics, respectively. DISCUSSION. The accuracy of the generated NN indicates its potential for use in real-time modeling, and further work will explore additional changes in post-operative soft-tissue balance as well as scaling to patient-specific geometry


Bone & Joint Research
Vol. 12, Issue 7 | Pages 447 - 454
10 Jul 2023
Lisacek-Kiosoglous AB Powling AS Fontalis A Gabr A Mazomenos E Haddad FS

The use of artificial intelligence (AI) is rapidly growing across many domains, of which the medical field is no exception. AI is an umbrella term defining the practical application of algorithms to generate useful output, without the need of human cognition. Owing to the expanding volume of patient information collected, known as ‘big data’, AI is showing promise as a useful tool in healthcare research and across all aspects of patient care pathways. Practical applications in orthopaedic surgery include: diagnostics, such as fracture recognition and tumour detection; predictive models of clinical and patient-reported outcome measures, such as calculating mortality rates and length of hospital stay; and real-time rehabilitation monitoring and surgical training. However, clinicians should remain cognizant of AI’s limitations, as the development of robust reporting and validation frameworks is of paramount importance to prevent avoidable errors and biases. The aim of this review article is to provide a comprehensive understanding of AI and its subfields, as well as to delineate its existing clinical applications in trauma and orthopaedic surgery. Furthermore, this narrative review expands upon the limitations of AI and future direction.

Cite this article: Bone Joint Res 2023;12(7):447–454.


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_12 | Pages 85 - 85
23 Jun 2023
de Mello F Kadirkamanathan V Wilkinson JM
Full Access

Successful estimation of postoperative PROMs prior to a joint replacement surgery is important in deciding the best treatment option for a patient. However, estimation of the outcome is associated with substantial noise around individual prediction. Here, we test whether a classifier neural network can be used to simultaneously estimate postoperative PROMs and uncertainty better than current methods. We perform Oxford hip score (OHS) estimation using data collected by the NJR from 249,634 hip replacement surgeries performed from 2009 to 2018. The root mean square error (RMSE) of the various methods are compared to the standard deviation of outcome change distribution to measure the proportion of the total outcome variability that the model can capture. The area under the curve (AUC) for the probability of the change score being above a certain threshold was also plotted. The proposed classifier NN had a better or equivalent RMSE than all other currently used models. The threshold AUC shows similar results for all methods close to a change score of 20 but demonstrates better accuracy of the classifier neural network close to 0 change and greater than 30 change, showing that the full probability distribution performed by the classifier neural network resulted in a significant improvement in estimating the upper and lower quantiles of the change score probability distribution. Consequently, probabilistic estimation as performed by the classifier NN is the most adequate approach to this problem, since the final score has an important component of uncertainty. This study shows the importance of uncertainty estimation to accompany postoperative PROMs prediction and presents a clinically-meaningful method for personalised outcome that includes such uncertainty estimation


Orthopaedic Proceedings
Vol. 104-B, Issue SUPP_4 | Pages 5 - 5
1 Apr 2022
de Mello F Kadirkamanathan V Wilkinson M
Full Access

Successful estimation of postoperative PROMs prior to a joint replacement surgery is important in deciding the best treatment option for a patient. However, estimation of the outcome is associated with substantial noise around individual prediction. Here, we test whether a classifier neural network can be used to simultaneously estimate postoperative PROMs and uncertainty better than current methods. We perform Oxford hip score (OHS) estimation using data collected by the NJR from 249,634 hip replacement surgeries performed from 2009 to 2018. The root mean square error (RMSE) of the various methods are compared to the standard deviation of outcome change distribution to measure the proportion of the total outcome variability that the model can capture. The area under the curve (AUC) for the probability of the change score being above a certain threshold was also plotted. The proposed classifier NN had a better or equivalent RMSE than all other currently used models. The standard deviation for the change score for the entire population was 9.93, which can be interpreted as the RMSE that would be achieved for a model that gives the same estimation for all patients regardless of the covariates. However, most of the variation in the postoperative OHS/OKS change score is not captured by the models, confirming the importance of accurate uncertainty estimation. The threshold AUC shows similar results for all methods close to a change score of 20 but demonstrates better accuracy of the classifier neural network close to 0 change and greater than 30 change, showing that the full probability distribution performed by the classifier neural network resulted in a significant improvement in estimating the upper and lower quantiles of the change score probability distribution. Consequently, probabilistic estimation as performed by the classifier NN is the most adequate approach to this problem, since the final score has an important component of uncertainty. This study shows the importance of uncertainty estimation to accompany postoperative PROMs prediction and presents a clinically-meaningful method for personalised outcome that includes such uncertainty estimation


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_18 | Pages 17 - 17
14 Nov 2024
Kjærgaard K Ding M Mansourvar M
Full Access

Introduction. Experimental bone research often generates large amounts of histology and histomorphometry data, and the analysis of these data can be time-consuming and trivial. Machine learning offers a viable alternative to manual analysis for measuring e.g. bone volume versus total volume. The objective was to develop a neural network for image segmentation, and to assess the accuracy of this network when applied to ectopic bone formation samples compared to a ground truth. Method. Thirteen tissue slides totaling 114 megapixels of ectopic bone formation were selected for model building. Slides were split into training, validation, and test data, with the test data reserved and only used for the final model assessment. We developed a neural network resembling U-Net that takes 512×512 pixel tiles. To improve model robustness, images were augmented online during training. The network was trained for 3 days on a NVidia Tesla K80 provided by a free online learning platform against ground truth masks annotated by an experienced researcher. Result. During training, the validation accuracy improved and stabilised at approx. 95%. The test accuracy was 96.1 %. Conclusion. Most experiments using ectopic bone formation will yield an inter-observer or inter-method variance of far more than 5%, so the current approach may be a valid and feasible technique for automated image segmentation for large datasets. More data or a consensus-based ground truth may improve training stability and validation accuracy. The code and data of this project are available upon request and will be available online as part of our publication


Bone & Joint Research
Vol. 13, Issue 10 | Pages 588 - 595
17 Oct 2024
Breu R Avelar C Bertalan Z Grillari J Redl H Ljuhar R Quadlbauer S Hausner T

Aims. The aim of this study was to create artificial intelligence (AI) software with the purpose of providing a second opinion to physicians to support distal radius fracture (DRF) detection, and to compare the accuracy of fracture detection of physicians with and without software support. Methods. The dataset consisted of 26,121 anonymized anterior-posterior (AP) and lateral standard view radiographs of the wrist, with and without DRF. The convolutional neural network (CNN) model was trained to detect the presence of a DRF by comparing the radiographs containing a fracture to the inconspicuous ones. A total of 11 physicians (six surgeons in training and five hand surgeons) assessed 200 pairs of randomly selected digital radiographs of the wrist (AP and lateral) for the presence of a DRF. The same images were first evaluated without, and then with, the support of the CNN model, and the diagnostic accuracy of the two methods was compared. Results. At the time of the study, the CNN model showed an area under the receiver operating curve of 0.97. AI assistance improved the physician’s sensitivity (correct fracture detection) from 80% to 87%, and the specificity (correct fracture exclusion) from 91% to 95%. The overall error rate (combined false positive and false negative) was reduced from 14% without AI to 9% with AI. Conclusion. The use of a CNN model as a second opinion can improve the diagnostic accuracy of DRF detection in the study setting. Cite this article: Bone Joint Res 2024;13(10):588–595


Orthopaedic Proceedings
Vol. 104-B, Issue SUPP_12 | Pages 90 - 90
1 Dec 2022
Abbas A Toor J Du JT Versteeg A Yee N Finkelstein J Abouali J Nousiainen M Kreder H Hall J Whyne C Larouche J
Full Access

Excessive resident duty hours (RDH) are a recognized issue with implications for physician well-being and patient safety. A major component of the RDH concern is on-call duty. While considerable work has been done to reduce resident call workload, there is a paucity of research in optimizing resident call scheduling. Call coverage is scheduled manually rather than demand-based, which generally leads to over-scheduling to prevent a service gap. Machine learning (ML) has been widely applied in other industries to prevent such issues of a supply-demand mismatch. However, the healthcare field has been slow to adopt these innovations. As such, the aim of this study was to use ML models to 1) predict demand on orthopaedic surgery residents at a level I trauma centre and 2) identify variables key to demand prediction. Daily surgical handover emails over an eight year (2012-2019) period at a level I trauma centre were collected. The following data was used to calculate demand: spine call coverage, date, and number of operating rooms (ORs), traumas, admissions and consults completed. Various ML models (linear, tree-based and neural networks) were trained to predict the workload, with their results compared to the current scheduling approach. Quality of models was determined by using the area under the receiver operator curve (AUC) and accuracy of the predictions. The top ten most important variables were extracted from the most successful model. During training, the model with the highest AUC and accuracy was the multivariate adaptive regression splines (MARS) model, with an AUC of 0.78±0.03 and accuracy of 71.7%±3.1%. During testing, the model with the highest AUC and accuracy was the neural network model, with an AUC of 0.81 and accuracy of 73.7%. All models were better than the current approach, which had an AUC of 0.50 and accuracy of 50.1%. Key variables used by the neural network model were (descending order): spine call duty, year, weekday/weekend, month, and day of the week. This was the first study attempting to use ML to predict the service demand on orthopaedic surgery residents at a major level I trauma centre. Multiple ML models were shown to be more appropriate and accurate at predicting the demand on surgical residents as compared to the current scheduling approach. Future work should look to incorporate predictive models with optimization strategies to match scheduling with demand in order to improve resident well being and patient care


Bone & Joint Open
Vol. 5, Issue 8 | Pages 671 - 680
14 Aug 2024
Fontalis A Zhao B Putzeys P Mancino F Zhang S Vanspauwen T Glod F Plastow R Mazomenos E Haddad FS

Aims. Precise implant positioning, tailored to individual spinopelvic biomechanics and phenotype, is paramount for stability in total hip arthroplasty (THA). Despite a few studies on instability prediction, there is a notable gap in research utilizing artificial intelligence (AI). The objective of our pilot study was to evaluate the feasibility of developing an AI algorithm tailored to individual spinopelvic mechanics and patient phenotype for predicting impingement. Methods. This international, multicentre prospective cohort study across two centres encompassed 157 adults undergoing primary robotic arm-assisted THA. Impingement during specific flexion and extension stances was identified using the virtual range of motion (ROM) tool of the robotic software. The primary AI model, the Light Gradient-Boosting Machine (LGBM), used tabular data to predict impingement presence, direction (flexion or extension), and type. A secondary model integrating tabular data with plain anteroposterior pelvis radiographs was evaluated to assess for any potential enhancement in prediction accuracy. Results. We identified nine predictors from an analysis of baseline spinopelvic characteristics and surgical planning parameters. Using fivefold cross-validation, the LGBM achieved 70.2% impingement prediction accuracy. With impingement data, the LGBM estimated direction with 85% accuracy, while the support vector machine (SVM) determined impingement type with 72.9% accuracy. After integrating imaging data with a multilayer perceptron (tabular) and a convolutional neural network (radiograph), the LGBM’s prediction was 68.1%. Both combined and LGBM-only had similar impingement direction prediction rates (around 84.5%). Conclusion. This study is a pioneering effort in leveraging AI for impingement prediction in THA, utilizing a comprehensive, real-world clinical dataset. Our machine-learning algorithm demonstrated promising accuracy in predicting impingement, its type, and direction. While the addition of imaging data to our deep-learning algorithm did not boost accuracy, the potential for refined annotations, such as landmark markings, offers avenues for future enhancement. Prior to clinical integration, external validation and larger-scale testing of this algorithm are essential. Cite this article: Bone Jt Open 2024;5(8):671–680


Bone & Joint Research
Vol. 12, Issue 9 | Pages 512 - 521
1 Sep 2023
Langenberger B Schrednitzki D Halder AM Busse R Pross CM

Aims. A substantial fraction of patients undergoing knee arthroplasty (KA) or hip arthroplasty (HA) do not achieve an improvement as high as the minimal clinically important difference (MCID), i.e. do not achieve a meaningful improvement. Using three patient-reported outcome measures (PROMs), our aim was: 1) to assess machine learning (ML), the simple pre-surgery PROM score, and logistic-regression (LR)-derived performance in their prediction of whether patients undergoing HA or KA achieve an improvement as high or higher than a calculated MCID; and 2) to test whether ML is able to outperform LR or pre-surgery PROM scores in predictive performance. Methods. MCIDs were derived using the change difference method in a sample of 1,843 HA and 1,546 KA patients. An artificial neural network, a gradient boosting machine, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic net, random forest, LR, and pre-surgery PROM scores were applied to predict MCID for the following PROMs: EuroQol five-dimension, five-level questionnaire (EQ-5D-5L), EQ visual analogue scale (EQ-VAS), Hip disability and Osteoarthritis Outcome Score-Physical Function Short-form (HOOS-PS), and Knee injury and Osteoarthritis Outcome Score-Physical Function Short-form (KOOS-PS). Results. Predictive performance of the best models per outcome ranged from 0.71 for HOOS-PS to 0.84 for EQ-VAS (HA sample). ML statistically significantly outperformed LR and pre-surgery PROM scores in two out of six cases. Conclusion. MCIDs can be predicted with reasonable performance. ML was able to outperform traditional methods, although only in a minority of cases. Cite this article: Bone Joint Res 2023;12(9):512–521


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_2 | Pages 5 - 5
1 Feb 2020
Burton W Myers C Rullkoetter P
Full Access

Introduction. Gait laboratory measurement of whole-body kinematics and ground reaction forces during a wide range of activities is frequently performed in joint replacement patient diagnosis, monitoring, and rehabilitation programs. These data are commonly processed in musculoskeletal modeling platforms such as OpenSim and Anybody to estimate muscle and joint reaction forces during activity. However, the processing required to obtain musculoskeletal estimates can be time consuming, requires significant expertise, and thus seriously limits the patient populations studied. Accordingly, the purpose of this study was to evaluate the potential of deep learning methods for estimating muscle and joint reaction forces over time given kinematic data, height, weight, and ground reaction forces for total knee replacement (TKR) patients performing activities of daily living (ADLs). Methods. 70 TKR patients were fitted with 32 reflective markers used to define anatomical landmarks for 3D motion capture. Patients were instructed to perform a range of tasks including gait, step-down and sit-to-stand. Gait was performed at a self-selected pace, step down from an 8” step height, and sit-to-stand using a chair height of 17”. Tasks were performed over a force platform while force data was collected at 2000 Hz and a 14 camera motion capture system collected at 100 Hz. The resulting data was processed in OpenSim to estimate joint reaction and muscle forces in the hip and knee using static optimization. The full set of data consisted of 135 instances from 70 patients with 63 sit-to-stands, 15 right-sided step downs, 14 left-sided step downs, and 43 gait sequences. Two classes of neural networks (NNs), a recurrent neural network (RNN) and temporal convolutional neural network (TCN), were trained to predict activity classification from joint angle, ground reaction force, and anthropometrics. The NNs were trained to predict muscle and joint reaction forces over time from the same input metrics. The 135 instances were split into 100 instances for training, 15 for validation, and 20 for testing. Results. The RNN and TCN yielded classification accuracies of 90% and 100% on the test set. Correlation coefficients between ground truth and predictions from the test set ranged from 0.81–0.95 for the RNN, depending on the activity. Predictions from both NNs were qualitatively assessed. Both NNs were able to effectively learn relationships between the input and output variables. Discussion. The objective of the study was to develop and evaluate deep learning methods for predicting patient mechanics from standard gait lab data. The resulting models classified activities with excellent performance, and showed promise for predicting exact values for loading metrics for a range of different activities. These results indicate potential for real-time prediction of musculoskeletal metrics with application in patient diagnostics and rehabilitation. For any figures or tables, please contact authors directly


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_7 | Pages 71 - 71
4 Apr 2023
Arrowsmith C Burns D Mak T Hardisty M Whyne C
Full Access

Access to health care, including physiotherapy, is increasingly occurring through virtual formats. At-home adherence to physical therapy programs is often poor and few tools exist to objectively measure low back physiotherapy exercise participation without the direct supervision of a medical professional. The aim of this study was to develop and evaluate the potential for performing automatic, unsupervised video-based monitoring of at-home low back physiotherapy exercises using a single mobile phone camera. 24 healthy adult subjects performed seven exercises based on the McKenzie low back physiotherapy program while being filmed with two smartphone cameras. Joint locations were automatically extracted using an open-source pose estimation framework. Engineered features were extracted from the joint location time series and used to train a support vector machine classifier (SVC). A convolutional neural network (CNN) was trained directly on the joint location time series data to classify exercises based on a recording from a single camera. The models were evaluated using a 5-fold cross validation approach, stratified by subject, with the class-balanced accuracy used as the performance metric. Optimal performance was achieved when using a total of 12 pose estimation landmarks from the upper and lower body, with the SVC model achieving a classification accuracy of 96±4% and the CNN model an accuracy of 97±2%. This study demonstrates the feasibility of using a smartphone camera and a supervised machine learning model to effectively assess at-home low back physiotherapy adherence. This approach could provide a low-cost, scalable method for tracking adherence to physical therapy exercise programs in a variety of settings


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_2 | Pages 2 - 2
2 Jan 2024
Ditmer S Dwenger N Jensen L Ghaffari A Rahbek O
Full Access

The most important outcome predictor of Legg-Calvé-Perthes disease (LCPD) is the shape of the healed femoral head. However, the deformity of the femoral head is currently evaluated by non-reproducible, categorical, and qualitative classifications. In this regard, recent advances in computer vision might provide the opportunity to automatically detect and delineate the outlines of bone in radiographic images for calculating a continuous measure of femoral head deformity. This study aimed to construct a pipeline for accurately detecting and delineating the proximal femur in radiographs of LCPD patients employing existing algorithms. To detect the proximal femur, the pretrained stateof-the-art object detection model, YOLOv5, was trained on 1580 manually annotated radiographs, validated on 338 radiographs, and tested on 338 radiographs. Additionally, 200 radiographs of shoulders and chests were added to the dataset to make the model more robust to false positives and increase generalizability. The convolutional neural network architecture, U-Net, was then employed to segment the detected proximal femur. The network was trained on 80 manually annotated radiographs using real-time data augmentation to increase the number of training images and enhance the generalizability of the segmentation model. The network was validated on 60 radiographs and tested on 60 radiographs. The object detection model achieved a mean Average Precision (mAP) of 0.998 using an Intersection over Union (IoU) threshold of 0.5, and a mAP of 0.712 over IoU thresholds of 0.5 to 0.95 on the test set. The segmentation model achieved an accuracy score of 0.912, a Dice Coefficient of 0.937, and a binary IoU score of 0.854 on the test set. The proposed fully automatic proximal femur detection and segmentation system provides a promising method for accurately detecting and delineating the proximal femoral bone contour in radiographic images, which is necessary for further image analysis


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_7 | Pages 134 - 134
4 Apr 2023
Arrowsmith C Alfakir A Burns D Razmjou H Hardisty M Whyne C
Full Access

Physiotherapy is a critical element in successful conservative management of low back pain (LBP). The aim of this study was to develop and evaluate a system with wearable inertial sensors to objectively detect sitting postures and performance of unsupervised exercises containing movement in multiple planes (flexion, extension, rotation). A set of 8 inertial sensors were placed on 19 healthy adult subjects. Data was acquired as they performed 7 McKenzie low-back exercises and 3 sitting posture positions. This data was used to train two models (Random Forest (RF) and XGBoost (XGB)) using engineered time series features. In addition, a convolutional neural network (CNN) was trained directly on the time series data. A feature importance analysis was performed to identify sensor locations and channels that contributed most to the models. Finally, a subset of sensor locations and channels was included in a hyperparameter grid search to identify the optimal sensor configuration and the best performing algorithm(s) for exercise classification. Models were evaluated using F1-score in a 10-fold cross validation approach. The optimal hardware configuration was identified as a 3-sensor setup using lower back, left thigh, and right ankle sensors with acceleration, gyroscope, and magnetometer channels. The XBG model achieved the highest exercise (F1=0.94±0.03) and posture (F1=0.90±0.11) classification scores. The CNN achieved similar results with the same sensor locations, using only the accelerometer and gyroscope channels for exercise classification (F1=0.94±0.02) and the accelerometer channel alone for posture classification (F1=0.91±0.03). This study demonstrates the potential of a 3-sensor lower body wearable solution (e.g. smart pants) that can identify proper sitting postures and exercises in multiple planes, suitable for low back pain. This technology has the potential to improve the effectiveness of LBP rehabilitation by facilitating quantitative feedback, early problem diagnosis, and possible remote monitoring


Bone & Joint Research
Vol. 12, Issue 3 | Pages 165 - 177
1 Mar 2023
Boyer P Burns D Whyne C

Aims. An objective technological solution for tracking adherence to at-home shoulder physiotherapy is important for improving patient engagement and rehabilitation outcomes, but remains a significant challenge. The aim of this research was to evaluate performance of machine-learning (ML) methodologies for detecting and classifying inertial data collected during in-clinic and at-home shoulder physiotherapy exercise. Methods. A smartwatch was used to collect inertial data from 42 patients performing shoulder physiotherapy exercises for rotator cuff injuries in both in-clinic and at-home settings. A two-stage ML approach was used to detect out-of-distribution (OOD) data (to remove non-exercise data) and subsequently for classification of exercises. We evaluated the performance impact of grouping exercises by motion type, inclusion of non-exercise data for algorithm training, and a patient-specific approach to exercise classification. Algorithm performance was evaluated using both in-clinic and at-home data. Results. The patient-specific approach with engineered features achieved the highest in-clinic performance for differentiating physiotherapy exercise from non-exercise activity (area under the receiver operating characteristic (AUROC) = 0.924). Including non-exercise data in algorithm training further improved classifier performance (random forest, AUROC = 0.985). The highest accuracy achieved for classifying individual in-clinic exercises was 0.903, using a patient-specific method with deep neural network model extracted features. Grouping exercises by motion type improved exercise classification. For at-home data, OOD detection yielded similar performance with the non-exercise data in the algorithm training (fully convolutional network AUROC = 0.919). Conclusion. Including non-exercise data in algorithm training improves detection of exercises. A patient-specific approach leveraging data from earlier patient-supervised sessions should be considered but is highly dependent on per-patient data quality. Cite this article: Bone Joint Res 2023;12(3):165–177