Advertisement for orthosearch.org.uk
Results 1 - 20 of 28
Results per page:
Bone & Joint Open
Vol. 1, Issue 6 | Pages 236 - 244
11 Jun 2020
Verstraete MA Moore RE Roche M Conditt MA

Aims. The use of technology to assess balance and alignment during total knee surgery can provide an overload of numerical data to the surgeon. Meanwhile, this quantification holds the potential to clarify and guide the surgeon through the surgical decision process when selecting the appropriate bone recut or soft tissue adjustment when balancing a total knee. Therefore, this paper evaluates the potential of deploying supervised machine learning (ML) models to select a surgical correction based on patient-specific intra-operative assessments. Methods. Based on a clinical series of 479 primary total knees and 1,305 associated surgical decisions, various ML models were developed. These models identified the indicated surgical decision based on available, intra-operative alignment, and tibiofemoral load data. Results. With an associated area under the receiver-operator curve ranging between 0.75 and 0.98, the optimized ML models resulted in good to excellent predictions. The best performing model used a random forest approach while considering both alignment and intra-articular load readings. Conclusion. The presented model has the potential to make experience available to surgeons adopting new technology, bringing expert opinion in their operating theatre, but also provides insight in the surgical decision process. More specifically, these promising outcomes indicated the relevance of considering the overall limb alignment in the coronal and sagittal plane to identify the appropriate surgical decision


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_8 | Pages 20 - 20
1 Aug 2020
Maher A Phan P Hoda M
Full Access

Degenerative lumbar spondylolisthesis (DLS) is a common condition with many available treatment options. The Degenerative Spondylolisthesis Instability Classification (DSIC) scheme, based on a systematic review of best available evidence, was proposed by Simmonds et al. in 2015. This classification scheme proposes that the stability of the patient's pathology be determined by a surgeon based on quantitative and qualitative clinical and radiographic parameters. The purpose of the study is to utilise machine learning to classify DLS patients according to the DSIC scheme, offering a novel approach in which an objectively consistent system is employed. The patient data was collected by CSORN between 2015 and 2018 and included 224 DLS surgery cases. The data was cleaned by two methods, firstly, by deleting all patient entries with missing data, and secondly, by imputing the missing data using a maximum likelihood function. Five machine learning algorithms were used: logistic regression, boosted trees, random forests, support vector machines, and decision trees. The models were built using Python-based libraries and trained and tested using sklearn and pandas librairies. The algorithms were trained and tested using the two data sets (deletion and imputation cleaning methods). The matplotlib library was used to graph the ROC curves, including the area under the curve. The machine learning models were all able to predict the DSIC grade. Of all the models, the support vector machine model performed best, achieving an area under the curve score of 0.82. This model achieved an accuracy of 63% and an F1 score of 0.58. Between the two data cleaning methods, the imputation method was better, achieving higher areas under the curve than the deletion method. The accuracy, recall, precision, and F1 scores were similar for both data cleaning methods. The machine learning models were able to effectively predict physician decision making and score patients based on the DSIC scheme. The support vector machine model was able to achieve an area under the curve of 0.82 in comparison to physician classification. Since the data set was relatively small, the results could be improved with training on a larger data set. The use of machine learning models in DLS classification could prove to be an efficient approach to reduce human bias and error. Further efforts are necessary to test the inter- and intra-observer reliability of the DSIC scheme, as well as to determine if the surgeons using the scheme are following DLS treatment recommendations


Orthopaedic Proceedings
Vol. 101-B, Issue SUPP_4 | Pages 110 - 110
1 Apr 2019
Verstraete M Conditt M Goodchild G
Full Access

Introduction & Aims. Patient recovery after total knee arthroplasty remains highly variable. Despite the growing interest in and implementation of patient reported outcome measures (e.g. Knee Society Score, Oxford Knee Score), the recovery process of the individual patient is poorly monitored. Unfortunately, patient reported outcomes represent a complex interaction of multiple physiological and psychological aspects, they are also limited by the discrete time intervals at which they are administered. The use of wearable sensors presents a potential alternative by continuously monitoring a patient's physical activity. These sensors however present their own challenges. This paper deals with the interpretation of the high frequency time signals acquired when using accelerometer-based wearable sensors. Method. During a preliminary validation, five healthy subjects were equipped with two wireless inertial measurement units (IMUs). Using adhesive tape, these IMU sensors were attached to the thigh and shank respectively. All subjects performed a series of supervised activities of daily living (ADL) in their everyday environment (1: walking, 2: stair ascent, 3: stair descent, 4: sitting, 5: laying, 6: standing). The supervisor timestamped the performed activities, such that the raw IMU signals could be uniquely linked to the performed activities. Subsequently, the acquired signals were reduced in Python. Each five second time window was characterized by the minimum, maximum and mean acceleration per sensor node. In addition, the frequency response was analyzed per sensor node as well as the correlation between both sensor nodes. Various machine learning approaches were subsequently implemented to predict the performed activities. Thereby, 60% of the acquired signals were used to train the mathematical models. These models were than used to predict the activity associated with the remaining 40% of the experimentally obtained data. Results. An overview of the obtained prediction accuracy per model stratified by ADL is provided in Table 1. The Nearest Neighbor and Random Forest algorithms performed worse compared to the Support Vector Machine and Decision Tree approaches. Even for the latter, differentiating between walking and stair ascent/descent remains challenging as well as differentiating between sitting, standing and laying. The prediction accuracies are however exceeding 90% for all activities when using the Support Vector Machine approach. This is further illustrated in Figure 1, indicating the actual versus predicted activity for the validation set. Conclusions. In conclusion, this paper presents an evaluation of different machine learning algorithms for the classification of activities of daily living from accelerometer-based wearable sensors. This facilitates evaluating a patient's ability to walk, climb or descend stairs, stand, lay or sit on a daily basis, understanding how active the patient is overall and which activities are routinely performed following arthroplasty surgery. Currently, effort is undertaken to understand how participation in these activities progresses with recovery following total knee arthroplasty


Orthopaedic Proceedings
Vol. 104-B, Issue SUPP_13 | Pages 42 - 42
1 Dec 2022
Abbas A Toor J Lex J Finkelstein J Larouche J Whyne C Lewis S
Full Access

Single level discectomy (SLD) is one of the most commonly performed spinal surgery procedures. Two key drivers of their cost-of-care are duration of surgery (DOS) and postoperative length of stay (LOS). Therefore, the ability to preoperatively predict SLD DOS and LOS has substantial implications for both hospital and healthcare system finances, scheduling and resource allocation. As such, the goal of this study was to predict DOS and LOS for SLD using machine learning models (MLMs) constructed on preoperative factors using a large North American database. The American College of Surgeons (ACS) National Surgical and Quality Improvement (NSQIP) database was queried for SLD procedures from 2014-2019. The dataset was split in a 60/20/20 ratio of training/validation/testing based on year. Various MLMs (traditional regression models, tree-based models, and multilayer perceptron neural networks) were used and evaluated according to 1) mean squared error (MSE), 2) buffer accuracy (the number of times the predicted target was within a predesignated buffer), and 3) classification accuracy (the number of times the correct class was predicted by the models). To ensure real world applicability, the results of the models were compared to a mean regressor model. A total of 11,525 patients were included in this study. During validation, the neural network model (NNM) had the best MSEs for DOS (0.99) and LOS (0.67). During testing, the NNM had the best MSEs for DOS (0.89) and LOS (0.65). The NNM yielded the best 30-minute buffer accuracy for DOS (70.9%) and ≤120 min, >120 min classification accuracy (86.8%). The NNM had the best 1-day buffer accuracy for LOS (84.5%) and ≤2 days, >2 days classification accuracy (94.6%). All models were more accurate than the mean regressors for both DOS and LOS predictions. We successfully demonstrated that MLMs can be used to accurately predict the DOS and LOS of SLD based on preoperative factors. This big-data application has significant practical implications with respect to surgical scheduling and inpatient bedflow, as well as major implications for both private and publicly funded healthcare systems. Incorporating this artificial intelligence technique in real-time hospital operations would be enhanced by including institution-specific operational factors such as surgical team and operating room workflow


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_1 | Pages 122 - 122
1 Feb 2020
Flood P Jensen A Banks S
Full Access

Disorders of human joints manifest during dynamic movement, yet no objective tools are widely available for clinicians to assess or diagnose abnormal joint motion during functional activity. Machine learning tools have supported advances in many applications for image interpretation and understanding and have the potential to enable clinically and economically practical methods for objective assessment of human joint mechanics. We performed a study using convolutional neural networks to autonomously segment radiographic images of knee replacements and to determine the potential for autonomous measurement of knee kinematics. The autonomously segmented images provided superior kinematic measurements for both femur and tibia implant components. We believe this is an encouraging first step towards realization of a completely autonomous capability to accurately quantify dynamic joint motion using a clinically and economically practical methodology


Full Access

Background. The advent of value-based conscientiousness and rapid-recovery discharge pathways presents surgeons, hospitals, and payers with the challenge of providing the same total hip arthroplasty episode of care in the safest and most economic fashion for the same fee, despite patient differences. Various predictive analytic techniques have been applied to medical risk models, such as sepsis risk scores, but none have been applied or validated to the elective primary total hip arthroplasty (THA) setting for key payment-based metrics. The objective of this study was to develop and validate a predictive machine learning model using preoperative patient demographics for length of stay (LOS) after primary THA as the first step in identifying a patient-specific payment model (PSPM). Methods. Using 229,945 patients undergoing primary THA for osteoarthritis from an administrative database between 2009– 16, we created a naïve Bayesian model to forecast LOS after primary THA using a 3:2 split in which 60% of the available patient data “built” the algorithm and the remaining 40% of patients were used for “testing.” This process was iterated five times for algorithm refinement, and model performance was determined using the area under the receiver operating characteristic curve (AUC), percent accuracy, and positive predictive value. LOS was either grouped as 1–5 days or greater than 5 days. Results. The machine learning model algorithm required age, race, gender, and two comorbidity scores (“risk of illness” and “risk of morbidity”) to demonstrate excellent validity, reliability, and responsiveness with an AUC of 0.87 after five iterations. Hospital stays of greater than 5 days for THA were most associated with increased risk of illness and risk of comorbidity scores during admission compared to 1–5 days of stay. Conclusions. Our machine learning model derived from administrative big data demonstrated excellent validity, reliability, and responsiveness after primary THA while accurately predicting LOS and identifying two comorbidity scores as key value-based metrics. Predictive data has the potential to engender a risk-based PSPM prior to primary THA and other elective orthopaedic procedures


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_1 | Pages 27 - 27
1 Feb 2020
Bloomfield R Williams H Broberg J Lanting B Teeter M
Full Access

Objective. Wearable sensors have enabled objective functional data collection from patients before total knee replacement (TKR) and at clinical follow-ups post-surgery whereas traditional evaluation has solely relied on self-reported subjective measures. The timed-up-and-go (TUG) test has been used to evaluate function but is commonly measured using only total completion time, which does not assess joint function or test completion strategy. The current work employs machine learning techniques to distinguish patient groups based on derived functional metrics from the TUG test and expose clinically important functional parameters that are predictive of patient recovery. Methods. Patients scheduled for TKR (n=70) were recruited and instrumented with a wearable sensor system while performing three TUG test trials. Remaining study patients (n=68) also completed three TUG trials at their 2, 6, and 13-week follow-ups. Many patients (n=36) have also participated up to their 26-week appointment. Custom developed software was used to segment recorded tests into sub-activities and extract 54 functional metrics to evaluate op/non-operative knee function. All preoperative TUG samples and their standardized metrics were clustered into two unlabelled groups using the k-means algorithm. Both groups were tracked forward to see how their early functional parameters translated to functional improvement at their three-month assessment. Test total completion time was used to estimate overall functional improvement and to relate findings to existing literature. Patients that completed their 26-week tests were tracked further to their most recent timepoint. Results. Preoperative clustering separated two groups with different test completion times (n=46 vs. n=22 with mean times of 13s vs. 22s). Of the faster preoperative group, 63% of patients maintained their time, 26% improved, and 11% worsened whereas of the slower preoperative group, 27% maintained, 64% improved, and 9% worsened. The high improvement group improved their times by 4.9s (p<0.01) between preoperative and 13-week visits whereas the other group had no significant change. Test times were different between both groups preoperatively (p<0.001) and at 6 (p=0.01) and 13 (p=0.03) weeks but not at 26 weeks (p=0.67). The high improvement group reached an overall improvement of 9s (p<0.01) at 26 weeks whereas the low improvement group still showed no improvement greater than the TUG minimal detectable change of 2.2s (1.8s, p<0.01)[1]. Test sub-activity times for both groups at each timepoint can be seen in Figure 1. Conclusions. This work has demonstrated that machine learning has the potential to find patterns in preoperative functional parameters that can predict functional improvement after surgery. While useful for assigning labels to the distinguished clusters, test completion time was not among the top distinguishable metrics between groups at three months which highlights the necessity for these more descriptive performance metrics when analyzing patient recovery. It is expected that these early predictions will be used to realistically adjust patient expectations or highlight opportunities for physiotherapeutic intervention to improve future outcomes. For any figures or tables, please contact the authors directly


Orthopaedic Proceedings
Vol. 98-B, Issue SUPP_8 | Pages 11 - 11
1 May 2016
Chanda S Gupta S Pratihar D
Full Access

The success of a cementless Total Hip Arthroplasty (THA) depends not only on initial micromotion, but also on long-term failure mechanisms, e.g., implant-bone interface stresses and stress shielding. Any preclinical investigation aimed at designing femoral implant needs to account for temporal evolution of interfacial condition, while dealing with these failure mechanisms. The goal of the present multi-criteria optimization study was to search for optimum implant geometry by implementing a novel machine learning framework comprised of a neural network (NN), genetic algorithm (GA) and finite element (FE) analysis. The optimum implant model was subsequently evaluated based on evolutionary interface conditions. The optimization scheme of our earlier study [1] has been used here with an additional inclusion of an NN to predict the initial fixation of an implant model. The entire CAD based parameterization technique for the implant was described previously [1]. Three objective functions, the first two based on proximal resorbed Bone Mass Fraction (BMF) [1] and implant-bone interface failure index [1], respectively, and the other based on initial micromotion, were formulated to model the multi-criteria optimization problem. The first two objective functions, e.g., objectives f1 and f2, were calculated from the FE analysis (Ansys), whereas the third objective (f3) involved an NN developed for the purpose of predicting the post-operative micromotion based on the stem design parameters. Bonded interfacial condition was used to account for the effects of stress shielding and interface stresses, whereas a set of contact models were used to develop the NN for faster prediction of post-operative micromotion. A multi-criteria GA was executed up to a desired number of generations for optimization (Fig. 1). The final trade-off model was further evaluated using a combined remodelling and bone ingrowth simulation based on an evolutionary interface condition [2], and subsequently compared with a generic TriLock implant. The non-dominated solutions obtained from the GA execution were interpolated to determine the 3D nature of the Pareto-optimal surface (Fig. 2). The effects of all failure mechanisms were found to be minimized in these optimized solutions (Fig. 2). However, the most compromised solution, i.e., the trade-off stem geometry (TSG), was chosen for further assessment based on evolutionary interfacial condition. The simulation-based combined remodelling and bone ingrowth study predicted a faster ingrowth for TSG as compared to the generic design. The surface area with post-operative (i.e., iteration 1) ingrowth was found to be ∼50% for the TSG, while that for the TriLock model was ∼38% (Fig. 3). However, both designs predicted similar long-term ingrowth (∼89% surface area). The long-term proximal bone resorption (upto lesser trochanter) was found to be ∼30% for the TSG, as compared to ∼37% for the TriLock model. The TSG was found to be bone-preserving with prominent frontal wedge and rectangular proximal section for better rotational stability; features present in some recent designs. The optimization scheme, therefore, appears to be a quick and robust preclinical assessment tool for cementless femoral implant design. To view tables/figures, please contact authors directly


Orthopaedic Proceedings
Vol. 99-B, Issue SUPP_20 | Pages 46 - 46
1 Dec 2017
Esfandiari H Anglin C Street J Guy P Hodgson A
Full Access

Pedicle screw fixation is a technically demanding procedure with potential difficulties and reoperation rates are currently on the order of 11%. The most common intraoperative practice for position assessment of pedicle screws is biplanar fluoroscopic imaging that is limited to two- dimensions and is associated to low accuracies. We have previously introduced a full-dimensional position assessment framework based on registering intraoperative X-rays to preoperative volumetric images with sufficient accuracies. However, the framework requires a semi-manual process of pedicle screw segmentation and the intraoperative X-rays have to be taken from defined positions in space in order to avoid pedicle screws' head occlusion. This motivated us to develop advancements to the system to achieve higher levels of automation in the hope of higher clinical feasibility.

In this study, we developed an automatic segmentation and X-ray adequacy assessment protocol. An artificial neural network was trained on a dataset that included a number of digitally reconstructed radiographs representing pedicle screw projections from different points of view. This model was able to segment the projection of any pedicle screw given an X-ray as its input with accuracy of 93% of the pixels. Once the pedicle screw was segmented, a number of descriptive geometric features were extracted from the isolated blob. These segmented images were manually labels as ‘adequate’ or ‘not adequate’ depending on the visibility of the screw axis. The extracted features along with their corresponding labels were used to train a decision tree model that could classify each X-ray based on its adequacy with accuracies on the order of 95%.

In conclusion, we presented here a robust, fast and automated pedicle screw segmentation process, combined with an accurate and automatic algorithm for classifying views of pedicle screws as adequate or not. These tools represent a useful step towards full automation of our pedicle screw positioning assessment system.


Orthopaedic Proceedings
Vol. 104-B, Issue SUPP_13 | Pages 60 - 60
1 Dec 2022
Martin RK Wastvedt S Pareek A Persson A Visnes H Fenstad AM Moatshe G Wolfson J Lind M Engebretsen L
Full Access

External validation of machine learning predictive models is achieved through evaluation of model performance on different groups of patients than were used for algorithm development. This important step is uncommonly performed, inhibiting clinical translation of newly developed models. Recently, machine learning was used to develop a tool that can quantify revision risk for a patient undergoing primary anterior cruciate ligament (ACL) reconstruction (https://swastvedt.shinyapps.io/calculator_rev/). The source of data included nearly 25,000 patients with primary ACL reconstruction recorded in the Norwegian Knee Ligament Register (NKLR). The result was a well-calibrated tool capable of predicting revision risk one, two, and five years after primary ACL reconstruction with moderate accuracy. The purpose of this study was to determine the external validity of the NKLR model by assessing algorithm performance when applied to patients from the Danish Knee Ligament Registry (DKLR). The primary outcome measure of the NKLR model was probability of revision ACL reconstruction within 1, 2, and/or 5 years. For the index study, 24 total predictor variables in the NKLR were included and the models eliminated variables which did not significantly improve prediction ability - without sacrificing accuracy. The result was a well calibrated algorithm developed using the Cox Lasso model that only required five variables (out of the original 24) for outcome prediction. For this external validation study, all DKLR patients with complete data for the five variables required for NKLR prediction were included. The five variables were: graft choice, femur fixation device, Knee Injury and Osteoarthritis Outcome Score (KOOS) Quality of Life subscale score at surgery, years from injury to surgery, and age at surgery. Predicted revision probabilities were calculated for all DKLR patients. The model performance was assessed using the same metrics as the NKLR study: concordance and calibration. In total, 10,922 DKLR patients were included for analysis. Average follow-up time or time-to-revision was 8.4 (±4.3) years and overall revision rate was 6.9%. Surgical technique trends (i.e., graft choice and fixation devices) and injury characteristics (i.e., concomitant meniscus and cartilage pathology) were dissimilar between registries. The model produced similar concordance when applied to the DKLR population compared to the original NKLR test data (DKLR: 0.68; NKLR: 0.68-0.69). Calibration was poorer for the DKLR population at one and five years post primary surgery but similar to the NKLR at two years. The NKLR machine learning algorithm demonstrated similar performance when applied to patients from the DKLR, suggesting that it is valid for application outside of the initial patient population. This represents the first machine learning model for predicting revision ACL reconstruction that has been externally validated. Clinicians can use this in-clinic calculator to estimate revision risk at a patient specific level when discussing outcome expectations pre-operatively. While encouraging, it should be noted that the performance of the model on patients undergoing ACL reconstruction outside of Scandinavia remains unknown


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_3 | Pages 118 - 118
23 Feb 2023
Zhou Y Dowsey M Spelman T Choong P Schilling C
Full Access

Approximately 20% of patients feel unsatisfied 12 months after primary total knee arthroplasty (TKA). Current predictive tools for TKA focus on the clinician as the intended user rather than the patient. The aim of this study is to develop a tool that can be used by patients without clinician assistance, to predict health-related quality of life (HRQoL) outcomes 12 months after total knee arthroplasty (TKA). All patients with primary TKAs for osteoarthritis between 2012 and 2019 at a tertiary institutional registry were analysed. The predictive outcome was improvement in Veterans-RAND 12 utility score at 12 months after surgery. Potential predictors included patient demographics, co-morbidities, and patient reported outcome scores at baseline. Logistic regression and three machine learning algorithms were used. Models were evaluated using both discrimination and calibration metrics. Predictive outcomes were categorised into deciles from 1 being the least likely to improve to 10 being the most likely to improve. 3703 eligible patients were included in the analysis. The logistic regression model performed the best in out-of-sample evaluation for both discrimination (AUC = 0.712) and calibration (gradient = 1.176, intercept = -0.116, Brier score = 0.201) metrics. Machine learning algorithms were not superior to logistic regression in any performance metric. Patients in the lowest decile (1) had a 29% probability for improvement and patients in the highest decile (10) had an 86% probability for improvement. Logistic regression outperformed machine learning algorithms in this study. The final model performed well enough with calibration metrics to accurately predict improvement after TKA using deciles. An ongoing randomised controlled trial (ACTRN12622000072718) is evaluating the effect of this tool on patient willingness for surgery. Full results of this trial are expected to be available by April 2023. A free-to-use online version of the tool is available at . smartchoice.org.au.


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_1 | Pages 76 - 76
1 Feb 2020
Roche C Simovitch R Flurin P Wright T Zuckerman J Routman H
Full Access

Introduction. Machine learning is a relatively novel method to orthopaedics which can be used to evaluate complex associations and patterns in outcomes and healthcare data. The purpose of this study is to utilize 3 different supervised machine learning algorithms to evaluate outcomes from a multi-center international database of a single shoulder prosthesis to evaluate the accuracy of each model to predict post-operative outcomes of both aTSA and rTSA. Methods. Data from a multi-center international database consisting of 6485 patients who received primary total shoulder arthroplasty using a single shoulder prosthesis (Equinoxe, Exactech, Inc) were analyzed from 19,796 patient visits in this study. Specifically, demographic, comorbidity, implant type and implant size, surgical technique, pre-operative PROMs and ROM measures, post-operative PROMs and ROM measures, pre-operative and post-operative radiographic data, and also adverse event and complication data were obtained for 2367 primary aTSA patients from 8042 visits at an average follow-up of 22 months and 4118 primary rTSA from 11,754 visits at an average follow-up of 16 months were analyzed to create a predictive model using 3 different supervised machine learning techniques: 1) linear regression, 2) random forest, and 3) XGBoost. Each of these 3 different machine learning techniques evaluated the pre-operative parameters and created a predictive model which targeted the post-operative composite score, which was a 100 point score consisting of 50% post-operative composite outcome score (calculated from 33.3% ASES + 33.3% UCLA + 33.3% Constant) and 50% post-operative composite ROM score (calculated from S curves weighted by 70% active forward flexion + 15% internal rotation score + 15% active external rotation). 3 additional predictive models were created to control for the time required for patient improvement after surgery, to do this, each primary aTSA and primary rTSA cohort was subdivided to only include patient data follow-up visits >20 months after surgery, this yielded 1317 primary aTSA patients from 2962 visits at an average follow-up of 50 months and 1593 primary rTSA from 3144 visits at an average follow-up of 42 months. Each of these 6 predictive models were trained using a random selection of 80% of each cohort, then each model predicted the outcomes of the remaining 20% of the data based upon the demographic, comorbidity, implant type and implant size, surgical technique, pre-operative PROMs and ROM measures inputs of each 20% cohort. The error of all 6 predictive models was calculated from the root mean square error (RMSE) between the actual and predicted post-op composite score. The accuracy of each model was determined by subtracting the percent difference of each RMSE value from the average composite score associated with each cohort. Results. For all patient visits, the XGBoost decision tree algorithm was the most accurate model for both aTSA & rTSA patients, with an accuracy of ∼89.5% for both aTSA and rTSA. However for patients with 20+ month visits only, the random forest decision tree algorithm was the most accurate model for both aTSA & rTSA patients, with an accuracy of ∼89.5% for both aTSA and rTSA. The linear regression model was the least accurate predictive model for each of the cohorts analyzed. However, it should be noted that all 3 machine learning models provided accuracy of ∼85% or better and a RMSE <12. (Table 1) Figures 1 and 2 depict the typical spread and RMSE of the actual vs. predicted total composite score associated with the 3 models for aTSA (Figure 1) and rTSA (Figure 2). Discussion. The results of this study demonstrate that multiple different machine learning algorithms can be utilized to create models that predict outcomes with higher accuracy for both aTSA and rTSA, for numerous timepoints after surgery. Future research should test this model on different datasets and using different machine learning methods in order to reduce over- and under-fitting model errors. For any figures or tables, please contact the authors directly


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_8 | Pages 79 - 79
1 Aug 2020
Bozzo A Ghert M Reilly J
Full Access

Advances in cancer therapy have prolonged patient survival even in the presence of disseminated disease and an increasing number of cancer patients are living with metastatic bone disease (MBD). The proximal femur is the most common long bone involved in MBD and pathologic fractures of the femur are associated with significant morbidity, mortality and loss of quality of life (QoL). Successful prophylactic surgery for an impending fracture of the proximal femur has been shown in multiple cohort studies to result in longer survival, preserved mobility, lower transfusion rates and shorter post-operative hospital stays. However, there is currently no optimal method to predict a pathologic fracture. The most well-known tool is Mirel's criteria, established in 1989 and is limited from guiding clinical practice due to poor specificity and sensitivity. The ideal clinical decision support tool will be of the highest sensitivity and specificity, non-invasive, generalizable to all patients, and not a burden on hospital resources or the patient's time. Our research uses novel machine learning techniques to develop a model to fill this considerable gap in the treatment pathway of MBD of the femur. The goal of our study is to train a convolutional neural network (CNN) to predict fracture risk when metastatic bone disease is present in the proximal femur. Our fracture risk prediction tool was developed by analysis of prospectively collected data of consecutive MBD patients presenting from 2009–2016. Patients with primary bone tumors, pathologic fractures at initial presentation, and hematologic malignancies were excluded. A total of 546 patients comprising 114 pathologic fractures were included. Every patient had at least one Anterior-Posterior X-ray and clinical data including patient demographics, Mirel's criteria, tumor biology, all previous radiation and chemotherapy received, multiple pain and function scores, medications and time to fracture or time to death. We have trained a convolutional neural network (CNN) with AP X-ray images of 546 patients with metastatic bone disease of the proximal femur. The digital X-ray data is converted into a matrix representing the color information at each pixel. Our CNN contains five convolutional layers, a fully connected layers of 512 units and a final output layer. As the information passes through successive levels of the network, higher level features are abstracted from the data. The model converges on two fully connected deep neural network layers that output the risk of fracture. This prediction is compared to the true outcome, and any errors are back-propagated through the network to accordingly adjust the weights between connections, until overall prediction accuracy is optimized. Methods to improve learning included using stochastic gradient descent with a learning rate of 0.01 and a momentum rate of 0.9. We used average classification accuracy and the average F1 score across five test sets to measure model performance. We compute F1 = 2 x (precision x recall)/(precision + recall). F1 is a measure of a model's accuracy in binary classification, in our case, whether a lesion would result in pathologic fracture or not. Our model achieved 88.2% accuracy in predicting fracture risk across five-fold cross validation testing. The F1 statistic is 0.87. This is the first reported application of convolutional neural networks, a machine learning algorithm, to this important Orthopaedic problem. Our neural network model was able to achieve reasonable accuracy in classifying fracture risk of metastatic proximal femur lesions from analysis of X-rays and clinical information. Our future work will aim to externally validate this algorithm on an international cohort


Orthopaedic Proceedings
Vol. 104-B, Issue SUPP_12 | Pages 90 - 90
1 Dec 2022
Abbas A Toor J Du JT Versteeg A Yee N Finkelstein J Abouali J Nousiainen M Kreder H Hall J Whyne C Larouche J
Full Access

Excessive resident duty hours (RDH) are a recognized issue with implications for physician well-being and patient safety. A major component of the RDH concern is on-call duty. While considerable work has been done to reduce resident call workload, there is a paucity of research in optimizing resident call scheduling. Call coverage is scheduled manually rather than demand-based, which generally leads to over-scheduling to prevent a service gap. Machine learning (ML) has been widely applied in other industries to prevent such issues of a supply-demand mismatch. However, the healthcare field has been slow to adopt these innovations. As such, the aim of this study was to use ML models to 1) predict demand on orthopaedic surgery residents at a level I trauma centre and 2) identify variables key to demand prediction. Daily surgical handover emails over an eight year (2012-2019) period at a level I trauma centre were collected. The following data was used to calculate demand: spine call coverage, date, and number of operating rooms (ORs), traumas, admissions and consults completed. Various ML models (linear, tree-based and neural networks) were trained to predict the workload, with their results compared to the current scheduling approach. Quality of models was determined by using the area under the receiver operator curve (AUC) and accuracy of the predictions. The top ten most important variables were extracted from the most successful model. During training, the model with the highest AUC and accuracy was the multivariate adaptive regression splines (MARS) model, with an AUC of 0.78±0.03 and accuracy of 71.7%±3.1%. During testing, the model with the highest AUC and accuracy was the neural network model, with an AUC of 0.81 and accuracy of 73.7%. All models were better than the current approach, which had an AUC of 0.50 and accuracy of 50.1%. Key variables used by the neural network model were (descending order): spine call duty, year, weekday/weekend, month, and day of the week. This was the first study attempting to use ML to predict the service demand on orthopaedic surgery residents at a major level I trauma centre. Multiple ML models were shown to be more appropriate and accurate at predicting the demand on surgical residents as compared to the current scheduling approach. Future work should look to incorporate predictive models with optimization strategies to match scheduling with demand in order to improve resident well being and patient care


Orthopaedic Proceedings
Vol. 104-B, Issue SUPP_12 | Pages 91 - 91
1 Dec 2022
Abbas A Toor J Saleh I Abouali J Wong PKC Chan T Sarhangian V
Full Access

Most cost containment efforts in public health systems have focused on regulating the use of hospital resources, especially operative time. As such, attempting to maximize the efficiency of limited operative time is important. Typically, hospital operating room (OR) scheduling of time is performed in two tiers: 1) master surgical scheduling (annual allocation of time between surgical services and surgeons) and 2) daily scheduling (a surgeon's selection of cases per operative day). Master surgical scheduling is based on a hospital's annual case mix and depends on the annual throughput rate per case type. This throughput rate depends on the efficiency of surgeons’ daily scheduling. However, daily scheduling is predominantly performed manually, which requires that the human planner simultaneously reasons about unknowns such as case-specific length-of-surgery and variability while attempting to maximize throughput. This often leads to OR overtime and likely sub-optimal throughput rate. In contrast, scheduling using mathematical and optimization methods can produce maximum systems efficiency, and is extensively used in the business world. As such, the purpose of our study was to compare the efficiency of 1) manual and 2) optimized OR scheduling at an academic-affiliated community hospital representative of most North American centres. Historic OR data was collected over a four year period for seven surgeons. The actual scheduling, surgical duration, overtime and number of OR days were extracted. This data was first configured to represent the historic manual scheduling process. Following this, the data was then used as the input to an integer linear programming model with the goal of determining the minimum number of OR days to complete the same number of cases while not exceeding the historic overtime values. Parameters included the use of a different quantile for each case type's surgical duration in order to ensure a schedule within five percent of the historic overtime value per OR day. All surgeons saw a median 10% (range: 9.2% to 18.3%) reduction in the number of OR days needed to complete their annual case-load compared to their historical scheduling practices. Meanwhile, the OR overtime varied by a maximum of 5%. The daily OR configurations differed from historic configurations in 87% of cases. In addition, the number of configurations per surgeon was reduced from an average of six to four. Our study demonstrates a significant increase in OR throughput rate (10%) with no change in operative time required. This has considerable implications in terms of cost reduction, surgical wait lists and surgeon satisfaction. A limitation of this study was that the potential gains are based on the efficiency of the pre-existing manual scheduling at our hospital. However, given the range of scenarios tested, number of surgeons included and the similarity of our hospital size and configuration to the majority of North American hospitals with an orthopedic service, these results are generalizable. Further optimization may be achieved by taking into account factors that could predict case duration such as surgeon experience, patients characteristics, and institutional attributes via machine learning


Orthopaedic Proceedings
Vol. 104-B, Issue SUPP_12 | Pages 33 - 33
1 Dec 2022
Abbas A Lex J Toor J Mosseri J Khalil E Ravi B Whyne C
Full Access

Total knee and hip arthroplasty (TKA and THA) are two of the highest volume and resource intensive surgical procedures. Key drivers of the cost of surgical care are duration of surgery (DOS) and postoperative inpatient length of stay (LOS). The ability to predict TKA and THA DOS and LOS has substantial implications for hospital finances, scheduling and resource allocation. The goal of this study was to predict DOS and LOS for elective unilateral TKAs and THAs using machine learning models (MLMs) constructed on preoperative patient factors using a large North American database. The American College of Surgeons (ACS) National Surgical and Quality Improvement (NSQIP) database was queried for elective unilateral TKA and THA procedures from 2014-2019. The dataset was split into training, validation and testing based on year. Multiple conventional and deep MLMs such as linear, tree-based and multilayer perceptrons (MLPs) were constructed. The models with best performance on the validation set were evaluated on the testing set. Models were evaluated according to 1) mean squared error (MSE), 2) buffer accuracy (the number of times the predicted target was within a predesignated buffer of the actual target), and 3) classification accuracy (the number of times the correct class was predicted by the models). To ensure useful predictions, the results of the models were compared to a mean regressor. A total of 499,432 patients (TKA 302,490; THA 196,942) were included. The MLP models had the best MSEs and accuracy across both TKA and THA patients. During testing, the TKA MSEs for DOS and LOS were 0.893 and 0.688 while the THA MSEs for DOS and LOS were 0.895 and 0.691. The TKA DOS 30-minute buffer accuracy and ≤120 min, >120 min classification accuracy were 78.8% and 88.3%, while the TKA LOS 1-day buffer accuracy and ≤2 days, >2 days classification accuracy were 75.2% and 76.1%. The THA DOS 30-minute buffer accuracy and ≤120 min, >120 min classification accuracy were 81.6% and 91.4%, while the THA LOS 1-day buffer accuracy and ≤2 days, >2 days classification accuracy were 78.3% and 80.4%. All models across both TKA and THA patients were more accurate than the mean regressors for both DOS and LOS predictions across both buffer and classification accuracies. Conventional and deep MLMs have been effectively implemented to predict the DOS and LOS of elective unilateral TKA and THA patients based on preoperative patient factors using a large North American database with a high level of accuracy. Future work should include using operational factors to further refine these models and improve predictive accuracy. Results of this work will allow institutions to optimize their resource allocation, reduce costs and improve surgical scheduling. Acknowledgements:. The American College of Surgeons National Surgical Quality Improvement Program and the hospitals participating in the ACS NSQIP are the source of the data used herein; they have not verified and are not responsible for the statistical validity of the data analysis or the conclusions derived by the authors


Orthopaedic Proceedings
Vol. 103-B, Issue SUPP_15 | Pages 85 - 85
1 Dec 2021
Goswami K Shope A Wright J Purtill J Lamendella R Parvizi J
Full Access

Aim. While metagenomic (microbial DNA) sequencing technologies can detect the presence of microbes in a clinical sample, it is unknown whether this signal represents dead or live organisms. Metatranscriptomics (sequencing of RNA) offers the potential to detect transcriptionally “active” organisms within a microbial community, and map expressed genes to functional pathways of interest (e.g. antibiotic resistance). We used this approach to evaluate the utility of metatrancriptomics to diagnose PJI and predict antibiotic resistance. Method. In this prospective study, samples were collected from 20 patients undergoing revision TJA (10 aseptic and 10 infected) and 10 primary TJA. Synovial fluid and peripheral blood samples were obtained at the time of surgery, as well as negative field controls (skin swabs, air swabs, sterile water). All samples were shipped to the laboratory for metatranscriptomic analysis. Following microbial RNA extraction and host analyte subtraction, metatranscriptomic sequencing was performed. Bioinformatic analyses were implemented prior to mapping against curated microbial sequence databases– to generate taxonomic expression profiles. Principle Coordinates Analysis (PCoA) and Partial Least Squares-Discriminant Analysis were utilized to ordinate metatranscriptomic profiles, using the 2018 definition of PJI as the gold-standard. Results. After RNA metatranscriptomic analysis, blinded PCoA modeling revealed accurate and distinct clustering of samples into 3 separate cohorts (infected, aseptic, and primary joints) – based on their active transcriptomic profile, both in synovial fluid and blood (synovial anosim p=0.001; blood anosim p=0.034). Differential metatranscriptomic signatures for infected versus noninfected cohorts enabled us to train machine learning algorithms to 84.9% predictive accuracy for infection. Multiple antibiotic resistance genes were expressed, with high concordance to conventional antibiotic sensitivity data. Conclusions. Our findings highlight the potential of metatranscriptomics for infection diagnosis. To our knowledge, this is the first report of RNA sequencing in the orthopaedic literature. Further work in larger patient cohorts will better inform deep learning approaches to improve accuracy, predictive power, and clinical utility of this technology


Orthopaedic Proceedings
Vol. 103-B, Issue SUPP_9 | Pages 16 - 16
1 Jun 2021
Roche C Simmons C Polakovic S Schoch B Parsons M Aibinder W Watling J Ko J Gobbato B Throckmorton T Routman H
Full Access

Introduction. Clinical decision support tools are software that match the input characteristics of an individual patient to an established knowledge base to create patient-specific assessments that support and better inform individualized healthcare decisions. Clinical decision support tools can facilitate better evidence-based care and offer the potential for improved treatment quality and selection, shared decision making, while also standardizing patient expectations. Methods. Predict+ is a novel, clinical decision support tool that leverages clinical data from the Exactech Equinoxe shoulder clinical outcomes database, which is composed of >11,000 shoulder arthroplasty patients using one specific implant type from more than 30 different clinical sites using standardized forms. Predict+ utilizes multiple coordinated and locked supervised machine learning algorithms to make patient-specific predictions of 7 outcome measures at multiple postoperative timepoints (from 3 months to 7 years after surgery) using as few as 19 preoperative inputs. Predict+ algorithms predictive accuracy for the 7 clinical outcome measures for each of aTSA and rTSA were quantified using the mean absolute error and the area under the receiver operating curve (AUROC). Results. Predict+ was released in November 2020 and is currently in limited launch in the US and select international markets. Predict+ utilizes an interactive graphical user interface to facilitate efficient entry of the preoperative inputs to generate personalized predictions of 7 clinical outcome measures achieved with aTSA and rTSA. Predict+ outputs a simple, patient-friendly graphical overview of preoperative status and a personalized 2-year outcome summary of aTSA and rTSA predictions for all 7 outcome measures to aid in the preoperative patient consultation process. Additionally, Predict+ outputs a detailed line-graph view of a patient's preoperative status and their personalized aTSA, rTSA, and aTSA vs. rTSA predicted outcomes for the 7 outcome measures at 6 postoperative timepoints. For each line-graph, the minimal clinically important difference (MCID) and substantial clinical benefit (SCB) patient-satisfaction improvement thresholds are displayed to aid the surgeon in assessing improvement potential for aTSA and rTSA and also relative to an average age and gender matched patient. The initial clinical experience of Predict+ has been positive. Input of the preoperative patient data is efficient and generally completed in <5 minutes. However, continued workflow improvements are necessary to limit the occurrence of responder fatigue. The graphical user interface is intuitive and facilitated a rapid assessment of expected patient outcomes. We have not found the use of this tool to be disruptive of our clinic's workflow. Ultimately, this tool has positively shifted the preoperative consultation towards discussion of clinical outcomes data, and that has been helpful to guide a patient's understanding of what can be realistically achieved with shoulder arthroplasty. Discussion and Conclusions. Predict+ aims to improve a surgeon's ability to preoperatively counsel patients electing to undergo shoulder arthroplasty. We are hopeful this innovative tool will help align surgeon and patient expectations and ultimately improve patient satisfaction with this elective procedure. Future research is required, but our initial experience demonstrates the positive potential of this predictive tool


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_8 | Pages 39 - 39
1 Aug 2020
Ma C Li C Jin Y Lu WW
Full Access

To explore a novel machine learning model to evaluate the vertebral fracture risk using Decision Tree model and train the model by Bone Mineral Density (BMD) of different compartments of vertebral body. We collected a Computed Tomography image dataset, including 10 patients with osteoporotic fracture and 10 patients without osteoporotic fracture. 40 non-fracture Vertebral bodies from T11 to L5 were segmented from 10 patients with osteoporotic fracture in the CT database and 53 non-fracture Vertebral bodies from T11 to L5 were segmented from 10 patients without osteoporotic fracture in the CT database. Based on the biomechanical properties, 93 vertebral bodies were further segmented into 11 compartments: eight trabecular bone, cortical shell, top and bottom endplate. BMD of these 11 compartments was calculated based on the HU value in CT images. Decision tree model was used to build fracture prediction model, and Support Vector Machine was built as a compared model. All BMD data was shuffled to a random order. 70% of data was used as training data, and 30% left was used as test data. Then, training prediction accuracy and testing prediction accuracy were calculated separately in the two models. The training accuracy of Decision Tree model is 100% and testing accuracy is 92.14% after trained by BMD data of 11 compartments of the vertebral body. The type I error is 7.14% and type II error is 0%. The training accuracy of Support Vector Machine model is 100% and the testing accuracy is 78.57%. The type I error is 17.86% and type II error is 3.57%. The performance of vertebral body fracture prediction using Decision Tree is significantly higher than using Support Vector Machine. The Decision Tree model is a potential risk assessment method for clinical application. The pilot evidence showed that Decision Tree prediction model overcomes the overfitting drawback of Support Vector Machine Model. However, larger dataset and cohort study should be conducted for further evidence


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_8 | Pages 48 - 48
1 Aug 2020
Burns D
Full Access

Participation in a physical therapy program is considered one of the greatest predictors for successful conservative management of common shoulder disorders, however, adherence to standard exercise protocols is often poor (around 50%) and typically worse for unsupervised home exercise programs. Currently, there are limited tools available for objective measurement of adherence and performance of shoulder rehabilitation in the home setting. The goal of this study was to develop and evaluate the potential for performing home shoulder physiotherapy monitoring using a commercial smartwatch. We hypothesize that shoulder physiotherapy exercises can be classified by analyzing the temporal sequence of inertial sensor outputs from a smartwatch worn on the extremity performing the exercise. Twenty healthy adult subjects with no prior shoulder disorders performed seven exercises from a standard evidence-based rotator cuff physiotherapy protocol: pendulum, abduction, forward elevation, internal/external rotation and trapezius extension with a resistance band, and a weighted bent-over row. Each participant performed 20 repetitions of each exercise bilaterally under the supervision of an orthopaedic surgeon, while 6-axis inertial sensor data was collected at 50 Hz from an Apple Watch. Using the scikit-learn and keras platforms, four supervised learning algorithms were trained to classify the exercises: k-nearest neighbour (k-NN), random forest (RF), support vector machine classifier (SVC), and a deep convolutional recurrent neural network (CRNN). Algorithm performance was evaluated using 5-fold cross-validation stratified first temporally and then by subject. Categorical classification accuracy was above 94% for all algorithms on the temporally stratified cross validation, with the best performance achieved by the CRNN algorithm (99.4± 0.2%). The subject stratified cross validation, which evaluated classifier performance on unseen subjects, yielded lower accuracies scores again with CRNN performing best (88.9 ± 1.6%). This proof-of concept study demonstrates the feasibility of a smartwatch device and machine learning approach to more easily monitor and assess the at-home adherence of shoulder physiotherapy exercise protocols. Future work will focus on translation of this technology to the clinical setting and evaluating exercise classification in shoulder disorder populations