Aims. The aim of this study was to evaluate the ability of a
There is increasing popularity in the use of artificial intelligence and
Aims. This study aimed to explore the biological and clinical importance of dysregulated key genes in osteoarthritis (OA) patients at the cartilage level to find potential biomarkers and targets for diagnosing and treating OA. Methods. Six sets of gene expression profiles were obtained from the Gene Expression Omnibus database. Differential expression analysis, weighted gene coexpression network analysis (WGCNA), and multiple
Aims.
Aims. To develop prediction models using
Artificial intelligence and
Aims. The aim of this study was to develop and evaluate machine-learning-based computerized adaptive tests (CATs) for the Oxford Hip Score (OHS), Oxford Knee Score (OKS), Oxford Shoulder Score (OSS), and the Oxford Elbow Score (OES) and its subscales. Methods. We developed CAT algorithms for the OHS, OKS, OSS, overall OES, and each of the OES subscales, using responses to the full-length questionnaires and a
External validation of machine learning predictive models is achieved through evaluation of model performance on different groups of patients than were used for algorithm development. This important step is uncommonly performed, inhibiting clinical translation of newly developed models. Recently, machine learning was used to develop a tool that can quantify revision risk for a patient undergoing primary anterior cruciate ligament (ACL) reconstruction (https://swastvedt.shinyapps.io/calculator_rev/). The source of data included nearly 25,000 patients with primary ACL reconstruction recorded in the Norwegian Knee Ligament Register (NKLR). The result was a well-calibrated tool capable of predicting revision risk one, two, and five years after primary ACL reconstruction with moderate accuracy. The purpose of this study was to determine the external validity of the NKLR model by assessing algorithm performance when applied to patients from the Danish Knee Ligament Registry (DKLR).
The primary outcome measure of the NKLR model was probability of revision ACL reconstruction within 1, 2, and/or 5 years. For the index study, 24 total predictor variables in the NKLR were included and the models eliminated variables which did not significantly improve prediction ability - without sacrificing accuracy. The result was a well calibrated algorithm developed using the Cox Lasso model that only required five variables (out of the original 24) for outcome prediction. For this external validation study, all DKLR patients with complete data for the five variables required for NKLR prediction were included. The five variables were: graft choice, femur fixation device, Knee Injury and Osteoarthritis Outcome Score (KOOS) Quality of Life subscale score at surgery, years from injury to surgery, and age at surgery. Predicted revision probabilities were calculated for all DKLR patients. The model performance was assessed using the same metrics as the NKLR study: concordance and calibration.
In total, 10,922 DKLR patients were included for analysis. Average follow-up time or time-to-revision was 8.4 (±4.3) years and overall revision rate was 6.9%. Surgical technique trends (i.e., graft choice and fixation devices) and injury characteristics (i.e., concomitant meniscus and cartilage pathology) were dissimilar between registries. The model produced similar concordance when applied to the DKLR population compared to the original NKLR test data (DKLR: 0.68; NKLR: 0.68-0.69). Calibration was poorer for the DKLR population at one and five years post primary surgery but similar to the NKLR at two years.
The NKLR machine learning algorithm demonstrated similar performance when applied to patients from the DKLR, suggesting that it is valid for application outside of the initial patient population. This represents the first machine learning model for predicting revision ACL reconstruction that has been externally validated. Clinicians can use this in-clinic calculator to estimate revision risk at a patient specific level when discussing outcome expectations pre-operatively. While encouraging, it should be noted that the performance of the model on patients undergoing ACL reconstruction outside of Scandinavia remains unknown.
Artificial Intelligence (AI) is becoming more powerful but is barely used to counter the growth in health care burden. AI applications to increase efficiency in orthopedics are rare. We questioned if (1) we could train machine learning (ML) algorithms, based on answers from digitalized history taking questionnaires, to predict treatment of hip osteoartritis (either conservative or surgical); (2) such an algorithm could streamline clinical consultation.
Multiple ML models were trained on 600 annotated (80% training, 20% test) digital history taking questionnaires, acquired before consultation. Best performing models, based on balanced accuracy and optimized automated hyperparameter tuning, were build into our daily clinical orthopedic practice. Fifty patients with hip complaints (>45 years) were prospectively predicted and planned (partly blinded, partly unblinded) for consultation with the physician assistant (conservative) or orthopedic surgeon (operative). Tailored patient information based on the prediction was automatically sent to a smartphone app. Level of evidence: IV.
Random Forest and BernoulliNB were the most accurate ML models (0.75 balanced accuracy). Treatment prediction was correct in 45 out of 50 consultations (90%), p<0.0001 (sign and binomial test). Specialized consultations where conservatively predicted patients were seen by the physician assistant and surgical patients by the orthopedic surgeon were highly appreciated and effective.
Treatment strategy of hip osteoartritis based on answers from digital history taking questionnaires was accurately predicted before patients entered the hospital. This can make outpatient consultation scheduling more efficient and tailor pre-consultation patient education.
Objectives
Articular cartilage damage is a primary outcome of pre-clinical and clinical studies evaluating meniscal and cartilage repair or replacement techniques. Recent studies have quantitatively characterized India Ink stained cartilage damage through light reflectance and the application of local or global thresholds. We develop a method for the quantitative characterisation of inked cartilage damage with improved generalisation capability, and compare its performance to the threshold-based baseline approach against gold standard labels.
Methods
The Trainable WEKA Segmentation (TWS) tool (Arganda-Carreras et al., 2017) available in Fiji (Rueden et al., 2017) was used to train two separate Random Forest classifiers to automatically segment cartilage damage on ink stained cadaveric ovine stifle joints. Gold standard labels were manually annotated for the training, validation and test datasets for each of the femoral and tibial classifiers. Each dataset included a sample of medial and lateral femoral condyles and tibial plateaus from various stifle joints, selected to ensure no overlap across datasets according to ovine identifier. Training was performed on the training data with the TWS tool using edge, texture and noise reduction filters selected for their suitability and performance. The two trained classifiers were then applied to the validation data to output damage probability maps, on which a threshold value was calibrated. Model predictions on the unseen test set were evaluated against the gold standard labels using the Dice Similarity Coefficient (DSC) – an overlap-based metric, and compared with results for the baseline global threshold approach applied in Fiji as shown in Figures 1 and 2.
Introduction
Instability remains a common complication following total hip arthroplasty (THA) and continues to account for the highest percentage of revisions in numerous registries. Many risk factors have been described, yet a patient-specific risk assessment tool remains elusive. The purpose of this study was to apply a machine learning algorithm to develop a patient-specific risk score capable of dynamic adjustment based on operative decisions.
Methods
22,086 THA performed between 1998–2018 were evaluated. 632 THA sustained a postoperative dislocation (2.9%). Patients were robustly characterized based on non-modifiable factors: demographics, THA indication, spinal disease, spine surgery, neurologic disease, connective tissue disease; and modifiable operative decisions: surgical approach, femoral head size, acetabular liner (standard/elevated/constrained/dual-mobility). Models were built with a binary outcome (event/no event) at 1-year and 5-year postoperatively. Inverse Probability Censoring Weighting accounted for censoring bias. An ensemble algorithm was created that included Generalized Linear Model, Generalized Additive Model, Lasso Penalized Regression, Kernel-Based Support Vector Machines, Random Forest and Optimized Gradient Boosting Machine. Convex combination of weights minimized the negative binomial log-likelihood loss function. Ten-fold cross-validation accounted for the rarity of dislocation events.
Excessive resident duty hours (RDH) are a recognized issue with implications for physician well-being and patient safety. A major component of the RDH concern is on-call duty. While considerable work has been done to reduce resident call workload, there is a paucity of research in optimizing resident call scheduling. Call coverage is scheduled manually rather than demand-based, which generally leads to over-scheduling to prevent a service gap. Machine learning (ML) has been widely applied in other industries to prevent such issues of a supply-demand mismatch. However, the healthcare field has been slow to adopt these innovations. As such, the aim of this study was to use ML models to 1) predict demand on orthopaedic surgery residents at a level I trauma centre and 2) identify variables key to demand prediction.
Daily surgical handover emails over an eight year (2012-2019) period at a level I trauma centre were collected. The following data was used to calculate demand: spine call coverage, date, and number of operating rooms (ORs), traumas, admissions and consults completed. Various ML models (linear, tree-based and neural networks) were trained to predict the workload, with their results compared to the current scheduling approach. Quality of models was determined by using the area under the receiver operator curve (AUC) and accuracy of the predictions. The top ten most important variables were extracted from the most successful model.
During training, the model with the highest AUC and accuracy was the multivariate adaptive regression splines (MARS) model, with an AUC of 0.78±0.03 and accuracy of 71.7%±3.1%. During testing, the model with the highest AUC and accuracy was the neural network model, with an AUC of 0.81 and accuracy of 73.7%. All models were better than the current approach, which had an AUC of 0.50 and accuracy of 50.1%. Key variables used by the neural network model were (descending order): spine call duty, year, weekday/weekend, month, and day of the week.
This was the first study attempting to use ML to predict the service demand on orthopaedic surgery residents at a major level I trauma centre. Multiple ML models were shown to be more appropriate and accurate at predicting the demand on surgical residents as compared to the current scheduling approach. Future work should look to incorporate predictive models with optimization strategies to match scheduling with demand in order to improve resident well being and patient care.
Total knee and hip arthroplasty (TKA and THA) are two of the highest volume and resource intensive surgical procedures. Key drivers of the cost of surgical care are duration of surgery (DOS) and postoperative inpatient length of stay (LOS). The ability to predict TKA and THA DOS and LOS has substantial implications for hospital finances, scheduling and resource allocation. The goal of this study was to predict DOS and LOS for elective unilateral TKAs and THAs using machine learning models (MLMs) constructed on preoperative patient factors using a large North American database.
The American College of Surgeons (ACS) National Surgical and Quality Improvement (NSQIP) database was queried for elective unilateral TKA and THA procedures from 2014-2019. The dataset was split into training, validation and testing based on year. Multiple conventional and deep MLMs such as linear, tree-based and multilayer perceptrons (MLPs) were constructed. The models with best performance on the validation set were evaluated on the testing set. Models were evaluated according to 1) mean squared error (MSE), 2) buffer accuracy (the number of times the predicted target was within a predesignated buffer of the actual target), and 3) classification accuracy (the number of times the correct class was predicted by the models). To ensure useful predictions, the results of the models were compared to a mean regressor.
A total of 499,432 patients (TKA 302,490; THA 196,942) were included. The MLP models had the best MSEs and accuracy across both TKA and THA patients. During testing, the TKA MSEs for DOS and LOS were 0.893 and 0.688 while the THA MSEs for DOS and LOS were 0.895 and 0.691. The TKA DOS 30-minute buffer accuracy and ≤120 min, >120 min classification accuracy were 78.8% and 88.3%, while the TKA LOS 1-day buffer accuracy and ≤2 days, >2 days classification accuracy were 75.2% and 76.1%. The THA DOS 30-minute buffer accuracy and ≤120 min, >120 min classification accuracy were 81.6% and 91.4%, while the THA LOS 1-day buffer accuracy and ≤2 days, >2 days classification accuracy were 78.3% and 80.4%. All models across both TKA and THA patients were more accurate than the mean regressors for both DOS and LOS predictions across both buffer and classification accuracies.
Conventional and deep MLMs have been effectively implemented to predict the DOS and LOS of elective unilateral TKA and THA patients based on preoperative patient factors using a large North American database with a high level of accuracy. Future work should include using operational factors to further refine these models and improve predictive accuracy. Results of this work will allow institutions to optimize their resource allocation, reduce costs and improve surgical scheduling.
Acknowledgements:
The American College of Surgeons National Surgical Quality Improvement Program and the hospitals participating in the ACS NSQIP are the source of the data used herein; they have not verified and are not responsible for the statistical validity of the data analysis or the conclusions derived by the authors.
Identification of patients at risk of not achieving minimally clinically important differences (MCID) in patient reported outcome measures (PROMs) is important to ensure principled and informed pre-operative decision making. Machine learning techniques may enable the generation of a predictive model for attainment of MCID in hip arthroscopy.
Aims: 1) to determine whether machine learning techniques could predict which patients will achieve MCID in the iHOT-12 PROM 6 months after arthroscopic management of femoroacetabular impingement (FAI), 2) to determine which factors contribute to their predictive power.
Data from the UK Non-Arthroplasty Hip Registry database was utilised. We identified 1917 patients who had undergone hip arthroscopy for FAI with both baseline and 6 month follow up iHOT-12 and baseline EQ-5D scores. We trained three established machine learning algorithms on our dataset to predict an outcome of iHOT-12 MCID improvement at 6 months given baseline characteristics including demographic factors, disease characteristics and PROMs. Performance was assessed using area under the receiver operating characteristic (AUROC) statistics with 5-fold cross validation.
The three machine learning algorithms showed quite different performance. The linear logistic regression model achieved AUROC = 0.59, the deep neural network achieved AUROC = 0.82, while a random forest model had the best predictive performance with AUROC 0.87. Of demographic factors, we found that BMI and age were key predictors for this model. We also found that removing all features except baseline responses to the iHOT-12 questionnaire had little effect on performance for the random forest model (AUROC = 0.85). Disease characteristics had little effect on model performance.
Machine learning models are able to predict with good accuracy 6-month post-operative MCID attainment in patients undergoing arthroscopic management for FAI. Baseline scores from the iHOT-12 questionnaire are sufficient to predict with good accuracy whether a patient is likely to reach MCID in post-operative PROMs.
Background
Advanced technologies, like robotics, provide enhanced precision for implanting total knee arthroplasty (TKA) components; however, optimal component position and limb alignment remain unknown. This study purpose was to identify the ideal target sagittal component position and coronal limb alignment that produce optimal clinical outcomes.
Methods
A retrospective review of 1,091 consecutive TKAs was performed. All TKAs were PCL retaining or sacrificing with anterior lipped (49.4%) or conforming bearings (50.6%) performed with modern perioperative protocols. Posterior tibial slope, femoral flexion, and tibiofemoral limb alignment were measured with a standardized protocols. Patients were grouped by the ‘how often does your knee feel normal?’ outcome score at latest follow-up. Machine learning algorithms were used to identify optimal alignment zones which predicted improved outcomes scores.
Background
Implant loosening is a common cause of a poor outcome and pain after total knee arthroplasty (TKA). Despite the increase in use of expensive techniques like arthrography, the detection of prosthetic loosening is often unclear pre-operatively, leading to diagnostic uncertainty and extensive workup. The objective of this study was to evaluate the ability of a machine learning (ML) algorithm to diagnose prosthetic loosening from pre-operative radiographs, and to observe what model inputs improve the performance of the model.
Methods
754 patients underwent a first-time revision of a total joint at our institution from 2012–2018. Pre-operative X-Rays (XR) were collected for each patient. AP and lateral X-Rays, in addition to demographic and comorbidity information, were collected for each patient. Each patient was determined to have either loose or fixed prosthetics based on a manual abstraction of the written findings in their operative report, which is considered the gold standard of diagnosing prosthetic loosening. We trained a series of deep convolution neural network (CNN) models to predict if a prosthesis was found to be loose in the operating room from the pre-operative XR. Each XR was pre-processed to segment the bone, implant, and bone-implant interface. A series of CNN models were built using existing, proven CNN architectures and weights optimized to our dataset. We then integrated our best performing model with historical patient data to create a final model and determine the incremental accuracy provided by additional layers of clinical information fed into the model. The models were evaluated by its accuracy, sensitivity and specificity.
Background
Postoperative recovery after routine total hip arthroplasty (THA) can lead to the development of prolonged opioid use but there are few tools for predicting this adverse outcome. The purpose of this study was to develop machine learning algorithms for preoperative prediction of prolonged post-operative opioid use after THA.
Methods
A retrospective review of electronic health records was conducted at two academic medical centers and three community hospitals to identify adult patients who underwent THA for osteoarthritis between January 1st, 2000 and August 1st, 2018. Prolonged postoperative opioid prescriptions were defined as continuous opioid prescriptions after surgery to at least 90 days after surgery. Five machine learning algorithms were developed to predict this outcome and were assessed by discrimination, calibration, and decision curve analysis.
Introduction
Machine learning is a relatively novel method to orthopaedics which can be used to evaluate complex associations and patterns in outcomes and healthcare data. The purpose of this study is to utilize 3 different supervised machine learning algorithms to evaluate outcomes from a multi-center international database of a single shoulder prosthesis to evaluate the accuracy of each model to predict post-operative outcomes of both aTSA and rTSA.
Methods
Data from a multi-center international database consisting of 6485 patients who received primary total shoulder arthroplasty using a single shoulder prosthesis (Equinoxe, Exactech, Inc) were analyzed from 19,796 patient visits in this study. Specifically, demographic, comorbidity, implant type and implant size, surgical technique, pre-operative PROMs and ROM measures, post-operative PROMs and ROM measures, pre-operative and post-operative radiographic data, and also adverse event and complication data were obtained for 2367 primary aTSA patients from 8042 visits at an average follow-up of 22 months and 4118 primary rTSA from 11,754 visits at an average follow-up of 16 months were analyzed to create a predictive model using 3 different supervised machine learning techniques: 1) linear regression, 2) random forest, and 3) XGBoost. Each of these 3 different machine learning techniques evaluated the pre-operative parameters and created a predictive model which targeted the post-operative composite score, which was a 100 point score consisting of 50% post-operative composite outcome score (calculated from 33.3% ASES + 33.3% UCLA + 33.3% Constant) and 50% post-operative composite ROM score (calculated from S curves weighted by 70% active forward flexion + 15% internal rotation score + 15% active external rotation). 3 additional predictive models were created to control for the time required for patient improvement after surgery, to do this, each primary aTSA and primary rTSA cohort was subdivided to only include patient data follow-up visits >20 months after surgery, this yielded 1317 primary aTSA patients from 2962 visits at an average follow-up of 50 months and 1593 primary rTSA from 3144 visits at an average follow-up of 42 months. Each of these 6 predictive models were trained using a random selection of 80% of each cohort, then each model predicted the outcomes of the remaining 20% of the data based upon the demographic, comorbidity, implant type and implant size, surgical technique, pre-operative PROMs and ROM measures inputs of each 20% cohort. The error of all 6 predictive models was calculated from the root mean square error (RMSE) between the actual and predicted post-op composite score. The accuracy of each model was determined by subtracting the percent difference of each RMSE value from the average composite score associated with each cohort.
Bone drilling is conducted in many surgical disciplines such as orthopedics, maxillofacial, and spine surgery. Most of these procedures involve drilling of different bone materials including hard (cortical) and soft (cancellous) tissues. Identifying these tissues is essential for surgeons to minimise damage to underlying nerves and vessels.
The sound signal generated during drilling is a valuable source of information that could potentially be employed. Such sounds can be captured readily and easily through non-contact sensors. Therefore, our goal in this preliminary study is to investigate whether drilling sounds can enable us to distinguish between cortical and cancellous tissues.
A bovine tibial bone was drilled, and the cortical and cancellous drilling sounds were captured. Each sound record was divided into small windows with a length of 50 ms and a 50% overlap. The window length was selected small, because our intended longer-term application is to provide the surgeon with near-real-time feedback. Short time Fourier Transform (STFT) coefficients were extracted from each window and were averaged accordingly to obtain
The total accuracies for ATL and LOO were 100% and 93.8% respectively obtained for
Introduction. With advances in artificial intelligence, the use of computer-aided detection and diagnosis in clinical imaging is gaining traction. Typically, very large datasets are required to train