Advertisement for orthosearch.org.uk
Results 1 - 20 of 39
Results per page:
Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_1 | Pages 52 - 52
2 Jan 2024
den Borre I
Full Access

Geometric deep learning is a relatively new field that combines the principles of deep learning with techniques from geometry and topology to analyze data with complex structures, such as graphs and manifolds. In orthopedic research, geometric deep learning has been applied to a variety of tasks, including the analysis of imaging data to detect and classify abnormalities, the prediction of patient outcomes following surgical interventions, and the identification of risk factors for degenerative joint disease. This review aims to summarize the current state of the field and highlight the key findings and applications of geometric deep learning in orthopedic research. The review also discusses the potential benefits and limitations of these approaches and identifies areas for future research. Overall, the use of geometric deep learning in orthopedic research has the potential to greatly advance our understanding of the musculoskeletal system and improve patient care


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_2 | Pages 102 - 102
10 Feb 2023
White J Wadhawan A Min H Rabi Y Schmutz B Dowling J Tchernegovski A Bourgeat P Tetsworth K Fripp J Mitchell G Hacking C Williamson F Schuetz M
Full Access

Distal radius fractures (DRFs) are one of the most common types of fracture and one which is often treated surgically. Standard X-rays are obtained for DRFs, and in most cases that have an intra-articular component, a routine CT is also performed. However, it is estimated that CT is only required in 20% of cases and therefore routine CT's results in the overutilisation of resources burdening radiology and emergency departments. In this study, we explore the feasibility of using deep learning to differentiate intra- and extra-articular DRFs automatically and help streamline which fractures require a CT. Retrospectively x-ray images were retrieved from 615 DRF patients who were treated with an ORIF at the Royal Brisbane and Women's Hospital. The images were classified into AO Type A, B or C fractures by three training registrars supervised by a consultant. Deep learning was utilised in a two-stage process: 1) localise and focus the region of interest around the wrist using the YOLOv5 object detection network and 2) classify the fracture using a EfficientNet-B3 network to differentiate intra- and extra-articular fractures. The distal radius region of interest (ROI) detection stage using the ensemble model of YOLO networks detected all ROIs on the test set with no false positives. The average intersection over union between the YOLO detections and the ROI ground truth was Error! Digit expected.. The DRF classification stage using the EfficientNet-B3 ensemble achieved an area under the receiver operating characteristic curve of 0.82 for differentiating intra-articular fractures. The proposed DRF classification framework using ensemble models of YOLO and EfficientNet achieved satisfactory performance in intra- and extra-articular fracture classification. This work demonstrates the potential in automatic fracture characterization using deep learning and can serve to streamline decision making for axial imaging helping to reduce unnecessary CT scans


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_16 | Pages 63 - 63
17 Nov 2023
Bicer M Phillips AT Melis A McGregor A Modenese L
Full Access

Abstract. OBJECTIVES. Application of deep learning approaches to marker trajectories and ground reaction forces (mocap data), is often hampered by small datasets. Enlarging dataset size is possible using some simple numerical approaches, although these may not be suited to preserving the physiological relevance of mocap data. We propose augmenting mocap data using a deep learning architecture called “generative adversarial networks” (GANs). We demonstrate appropriate use of GANs can capture variations of walking patterns due to subject- and task-specific conditions (mass, leg length, age, gender and walking speed), which significantly affect walking kinematics and kinetics, resulting in augmented datasets amenable to deep learning analysis approaches. METHODS. A publicly available (. https://www.nature.com/articles/s41597-019-0124-4. ) gait dataset (733 trials, 21 women and 25 men, 37.2 ± 13.0 years, 1.74 ± 0.09 m, 72.0 ± 11.4 kg, walking speeds ranging from 0.18 m/s to 2.04 m/s) was used as the experimental dataset. The GAN comprised three neural networks: an encoder, a decoder, and a discriminator. The encoder compressed experimental data into a fixed-length vector, while the decoder transformed the encoder's output vector and a condition vector (containing information about the subject and trial) into mocap data. The discriminator distinguished between the encoded experimental data from randomly sampled vectors of the same size. By training these networks jointly using the experimental dataset, the generator (decoder) could generate synthetic data respecting specified conditions from randomly sampled vectors. Synthetic mocap data and lower limb joint angles were generated and compared to the experimental data, by identifying the statistically significant differences across the gait cycle for a randomly selected subset of the experimental data from 5 female subjects (73 trials, aged 26–40, weighing 57–74 kg, with leg lengths between 868–931 mm, and walking speeds ranging from 0.81–1.68 m/s). By conducting these comparisons for this subset, we aimed to assess the synthetic data generated using multiple conditions. RESULTS. We visually inspected the synthetic trials to ensure that they appeared realistic. The statistical comparison revealed that, on average, only 2.5% of the gait cycle showed significantly differences in the joint angles of the two data groups. Additionally, the synthetic ground reaction forces deviated from the experimental data distribution for an average of 2.9% of the gait cycle. CONCLUSIONS. We introduced a novel approach for generating synthetic mocap data of human walking based on the conditions that influence walking patterns. The synthetic data closely followed the trends observed in the experimental data, also in the literature, suggesting that our approach can augment mocap datasets considering multiple conditions, an approach unfeasible in previous work. Creation of large, augmented datasets allows the application of other deep learning approaches, with the potential to generate realistic mocap data from limited and non-lab-based data. Our method could also enhance data sharing since synthetic data does not raise ethical concerns. You can generate and download virtual gait data using our GAN approach from . https://thisgaitdoesnotexist.streamlit.app/. . Declaration of Interest. (b) declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported:I declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research project


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_18 | Pages 69 - 69
14 Nov 2024
Sawant S Borotikar B Raghu V Audenaert E Khanduja V
Full Access

Introduction. Three-dimensional (3D) morphological understanding of the hip joint, specifically the joint space and surrounding anatomy, including the proximal femur and the pelvis bone, is crucial for a range of orthopedic diagnoses and surgical planning. While deep learning algorithms can provide higher accuracy for segmenting bony structures, delineating hip joint space formed by cartilage layers is often left for subjective manual evaluation. This study compared the performance of two state-of-the-art 3D deep learning architectures (3D UNET and 3D UNETR) for automated segmentation of proximal femur bone, pelvis bone, and hip joint space with single and multi-class label segmentation strategies. Method. A dataset of 56 3D CT images covering the hip joint was used for the study. Two bones and hip joint space were manually segmented for training and evaluation. Deep learning models were trained and evaluated for a single-class approach for each label (proximal femur, pelvis, and the joint space) separately, and for a multi-class approach to segment all three labels simultaneously. A consistent training configuration of hyperparameters was used across all models by implementing the AdamW optimizer and Dice Loss as the primary loss function. Dice score, Root Mean Squared Error, and Mean Absolute Error were utilized as evaluation metrics. Results. Both the models performed at excellent levels for single-label segmentations in bones (dice > 0.95), but single-label joint space performance remained considerably lower (dice < 0.87). Multi-class segmentations remained at lower performance (dice < 0.88) for both models. Combining bone and joint space labels may have introduced a class imbalance problem in multi-class models, leading to lower performance. Conclusion. It is not clear if 3D UNETR provides better performance as the selection of hyperparameters was the same across the models and was not optimized. Further evaluations will be needed with baseline UNET and nnUNET modeling architectures


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_3 | Pages 70 - 70
23 Feb 2023
Gupta S Smith G Wakelin E Van Der Veen T Plaskos C Pierrepont J
Full Access

Evaluation of patient specific spinopelvic mobility requires the detection of bony landmarks in lateral functional radiographs. Current manual landmarking methods are inefficient, and subjective. This study proposes a deep learning model to automate landmark detection and derivation of spinopelvic measurements (SPM). A deep learning model was developed using an international multicenter imaging database of 26,109 landmarked preoperative, and postoperative, lateral functional radiographs (HREC: Bellberry: 2020-08-764-A-2). Three functional positions were analysed: 1) standing, 2) contralateral step-up and 3) flexed seated. Landmarks were manually captured and independently verified by qualified engineers during pre-operative planning with additional assistance of 3D computed tomography derived landmarks. Pelvic tilt (PT), sacral slope (SS), and lumbar lordotic angle (LLA) were derived from the predicted landmark coordinates. Interobserver variability was explored in a pilot study, consisting of 9 qualified engineers, annotating three functional images, while blinded to additional 3D information. The dataset was subdivided into 70:20:10 for training, validation, and testing. The model produced a mean absolute error (MAE), for PT, SS, and LLA of 1.7°±3.1°, 3.4°±3.8°, 4.9°±4.5°, respectively. PT MAE values were dependent on functional position: standing 1.2°±1.3°, step 1.7°±4.0°, and seated 2.4°±3.3°, p< 0.001. The mean model prediction time was 0.7 seconds per image. The interobserver 95% confidence interval (CI) for engineer measured PT, SS and LLA (1.9°, 1.9°, 3.1°, respectively) was comparable to the MAE values generated by the model. The model MAE reported comparable performance to the gold standard when blinded to additional 3D information. LLA prediction produced the lowest SPM accuracy potentially due to error propagation from the SS and L1 landmarks. Reduced PT accuracy in step and seated functional positions may be attributed to an increased occlusion of the pubic-symphysis landmark. Our model shows excellent performance when compared against the current gold standard manual annotation process


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_18 | Pages 50 - 50
14 Nov 2024
Birkholtz F Eken M Swanevelder M Engelbrecht A
Full Access

Introduction. Inaccurate identification of implants on X-rays may lead to prolonged surgical duration as well as increased complexity and costs during implant removal. Deep learning models may help to address this problem, although they typically require large datasets to effectively train models in detecting and classifying objects, e.g. implants. This can limit applicability for instances when only smaller datasets are available. Transfer learning can be used to overcome this limitation by leveraging large, publicly available datasets to pre-train detection and classification models. The aim of this study was to assess the effectiveness of deep learning models in implant localisation and classification on a lower limb X-ray dataset. Method. Firstly, detection models were evaluated on their ability to localise four categories of implants, e.g. plates, screws, pins, and intramedullary nails. Detection models (Faster R-CNN, YOLOv5, EfficientDet) were pre-trained on the large, freely available COCO dataset (330000 images). Secondly, classification models (DenseNet121, Inception V3, ResNet18, ResNet101) were evaluated on their ability to classify five types of intramedullary nails. Localisation and classification accuracy were evaluated on a smaller image dataset (204 images). Result. The YOLOv5s model showed the best capacity to detect and distinguish between different types of implants (accuracy: plate=82.1%, screw=72.3%, intramedullary nail=86.9%, pin=79.9%). Screw implants were the most difficult implant to detect, likely due to overlapping screw implants visible in the image dataset. The DenseNet121 classification model showed the best performance in classifying different types of intramedullary nails (accuracy=73.7%). Therefore, a deep learning model pipeline with the YOLOv5s and DenseNet121 was proposed for the most optimal performance of automating implants localisation and classification for a relatively small dataset. Conclusion. These findings support the potential of deep learning techniques in enhancing implant detection accuracy. With further development, AI-based implant identification may benefit patients, surgeons and hospitals through improved surgical planning and efficient use of theatre time


Orthopaedic Proceedings
Vol. 103-B, Issue SUPP_16 | Pages 52 - 52
1 Dec 2021
Wang J Hall T Musbahi O Jones G van Arkel R
Full Access

Abstract. Objectives. Knee alignment affects both the development and surgical treatment of knee osteoarthritis. Automating femorotibial angle (FTA) and hip-knee-ankle angle (HKA) measurement from radiographs could improve reliability and save time. Further, if the gold-standard HKA from full-limb radiographs could be accurately predicted from knee-only radiographs then the need for more expensive equipment and radiation exposure could be reduced. The aim of this research is to assess if deep learning methods can predict FTA and HKA angle from posteroanterior (PA) knee radiographs. Methods. Convolutional neural networks with densely connected final layers were trained to analyse PA knee radiographs from the Osteoarthritis Initiative (OAI) database with corresponding angle measurements. The FTA dataset with 6149 radiographs and HKA dataset with 2351 radiographs were split into training, validation and test datasets in a 70:15:15 ratio. Separate models were learnt for the prediction of FTA and HKA, which were trained using mean squared error as a loss function. Heat maps were used to identify the anatomical features within each image that most contributed to the predicted angles. Results. FTA could be predicted with errors less than 3° for 99.8% of images, and less than 1° for 89.5%. HKA prediction was less accurate than FTA but still high: 95.7% within 3°, and 68.0 % within 1°. Heat maps for both models were generally concentrated on the knee anatomy and could prove a valuable tool for assessing prediction reliability in clinical application. Conclusions. Deep learning techniques could enable fast, reliable and accurate predictions of both FTA and HKA from plain knee radiographs. This could lead to cost savings for healthcare providers and reduced radiation exposure for patients


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_1 | Pages 4 - 4
1 Feb 2020
Oni J Yi P Wei J Kim T Sair H Fritz J Hager G
Full Access

Introduction. Automated identification of arthroplasty implants could aid in pre-operative planning and is a task which could be facilitated through artificial intelligence (AI) and deep learning. The purpose of this study was to develop and test the performance of a deep learning system (DLS) for automated identification and classification of knee arthroplasty (KA) on radiographs. Methods. We collected 237 AP knee radiographs with equal proportions of native knees, total KA (TKA), and unicompartmental KA (UKA), as well as 274 radiographs with equal proportions of Smith & Nephew Journey and Zimmer NexGen TKAs. Data augmentation was used to increase the number of images available for DLS development. These images were used to train, validate, and test deep convolutional neural networks (DCNN) to 1) detect the presence of TKA; 2) differentiate between TKA and UKA; and 3) differentiate between the 2 TKA models. Receiver operating characteristic (ROC) curves were generated with area under the curve (AUC) calculated to assess test performance. Results. The DCNNs trained to detect KA and to distinguish between TKA and UKA both achieved AUC of 1. In both cases, heatmap analysis demonstrated appropriate emphasis of the KA components in decision-making. The DCNN trained to distinguish between the 2 TKA models also achieved AUC of 1. Heatmap analysis of this DCNN showed emphasis of specific unique features of the TKA model designs for decision making, such as the anterior flange shape of the Zimmer NexGen TKA (Figure 1) and the tibial baseplate/stem shape of the Smith & Nephew Journey TKA (Figure 2). Conclusion. DCNNs can accurately identify presence of TKA and distinguish between specific designs. The proof-of-concept of these DCNNs may set the foundation for DCNNs to identify other prosthesis models and prosthesis-related complications. For any figures or tables, please contact the authors directly


Orthopaedic Proceedings
Vol. 102-B, Issue SUPP_2 | Pages 5 - 5
1 Feb 2020
Burton W Myers C Rullkoetter P
Full Access

Introduction. Gait laboratory measurement of whole-body kinematics and ground reaction forces during a wide range of activities is frequently performed in joint replacement patient diagnosis, monitoring, and rehabilitation programs. These data are commonly processed in musculoskeletal modeling platforms such as OpenSim and Anybody to estimate muscle and joint reaction forces during activity. However, the processing required to obtain musculoskeletal estimates can be time consuming, requires significant expertise, and thus seriously limits the patient populations studied. Accordingly, the purpose of this study was to evaluate the potential of deep learning methods for estimating muscle and joint reaction forces over time given kinematic data, height, weight, and ground reaction forces for total knee replacement (TKR) patients performing activities of daily living (ADLs). Methods. 70 TKR patients were fitted with 32 reflective markers used to define anatomical landmarks for 3D motion capture. Patients were instructed to perform a range of tasks including gait, step-down and sit-to-stand. Gait was performed at a self-selected pace, step down from an 8” step height, and sit-to-stand using a chair height of 17”. Tasks were performed over a force platform while force data was collected at 2000 Hz and a 14 camera motion capture system collected at 100 Hz. The resulting data was processed in OpenSim to estimate joint reaction and muscle forces in the hip and knee using static optimization. The full set of data consisted of 135 instances from 70 patients with 63 sit-to-stands, 15 right-sided step downs, 14 left-sided step downs, and 43 gait sequences. Two classes of neural networks (NNs), a recurrent neural network (RNN) and temporal convolutional neural network (TCN), were trained to predict activity classification from joint angle, ground reaction force, and anthropometrics. The NNs were trained to predict muscle and joint reaction forces over time from the same input metrics. The 135 instances were split into 100 instances for training, 15 for validation, and 20 for testing. Results. The RNN and TCN yielded classification accuracies of 90% and 100% on the test set. Correlation coefficients between ground truth and predictions from the test set ranged from 0.81–0.95 for the RNN, depending on the activity. Predictions from both NNs were qualitatively assessed. Both NNs were able to effectively learn relationships between the input and output variables. Discussion. The objective of the study was to develop and evaluate deep learning methods for predicting patient mechanics from standard gait lab data. The resulting models classified activities with excellent performance, and showed promise for predicting exact values for loading metrics for a range of different activities. These results indicate potential for real-time prediction of musculoskeletal metrics with application in patient diagnostics and rehabilitation. For any figures or tables, please contact authors directly


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_7 | Pages 106 - 106
4 Apr 2023
Ding Y Luo W Chen Z Guo P Lei B Zhang Q Chen Z Fu Y Li C Ma T Liu J
Full Access

Quantitative ultrasound (QUS) is a promising tool to estimate bone structure characteristics and predict fragile fracture. The aim of this pilot cross-sectional study was to evaluate the performance of a multi-channel residual network (MResNet) based on ultrasonic radiofrequency (RF) signal to discriminate fragile fractures retrospectively in postmenopausal women. Methods. RF signal and speed of sound (SOS) were obtained using an axial transmission QUS at one‐third distal radius for 246 postmenopausal women. Based on the involved RF signal, we conducted a MResNet, which combines multi-channel training with original ResNet, to classify the high risk of fragility fractures patients from all subjects. The bone mineral density (BMD) at lumber, hip and femoral neck acquired with DXA was recorded on the same day. The fracture history of all subjects in adulthood were collected. To assess the ability of the different methods in the discrimination of fragile fracture, the odds ratios (OR) calculated using binomial logistic regression analysis and the area under the receiver operator characteristic curves (AUC) were analyzed. Results. Among the 246 postmenopausal women, 170 belonged to the non-fracture group, 50 to the vertebral group, and 26 to the non-vertebral fracture group. MResNet was discriminant for all fragile fractures (OR = 2.64; AUC = 0.74), for Vertebral fracture (OR = 3.02; AUC = 0.77), for non-vertebral fracture (OR = 2.01; AUC = 0.69). MResNet showed comparable performance to that of BMD of hip and lumbar with all types of fractures, and significantly better performance than SOS all types of fractures. Conclusions. the MResNet model based on the ultrasonic RF signal can significantly improve the ability of QUS device to recognize previous fragile fractures. Moreover, the performance of the proposed model modified by age, weight, and height is further optimized. These results open perspectives to evaluate the risk of fragile fracture applying a deep learning model to analyze ultrasonic RF signal


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_18 | Pages 55 - 55
14 Nov 2024
Vinco G Ley C Dixon P Grimm B
Full Access

Introduction. The ability to walk over various surfaces such as cobblestones, slopes or stairs is a very patient centric and clinically meaningful mobility outcome. Current wearable sensors only measure step counts or walking speed regardless of such context relevant for assessing gait function. This study aims to improve deep learning (DL) models to classify surfaces of walking by altering and comparing model features and sensor configurations. Method. Using a public dataset, signals from 6 IMUs (Movella DOT) worn on various body locations (trunk, wrist, right/left thigh, right/left shank) of 30 subjects walking on 9 surfaces were analyzed (flat ground, ramps (up/down), stairs (up/down), cobblestones (irregular), grass (soft), banked (left/right)). Two variations of a CNN Bi-directional LSTM model, with different Batch Normalization layer placement (beginning vs end) as well as data reduction to individual sensors (versus combined) were explored and model performance compared in-between and with previous models using F1 scores. Result. The Bi-LSTM architecture improved performance over previous models, especially for subject-wise data splitting and when combining the 6 sensor locations (e.g. F1=0.94 versus 0.77). Placement of the Batch Normalization layer at the beginning, prior to the convolutional layer, enhanced model understanding of participant gait variations across surfaces. Single sensor performance was best on the right shank (F1=0.88). Conclusion. Walking surface detection using wearable IMUs and DL models shows promise for clinically relevant real-world applications, achieving high F1 levels (>0.9) even for subject-wise data splitting enhancing the model applicability in real-world scenarios. Normalization techniques, such as Batch Normalization, seem crucial for optimizing model performance across diverse participant data. Also single-sensor set-ups can give acceptable performance, in particular for specific surface types of potentially high clinical relevance (e.g. stairs, ramps), offering practical and cost-effective solutions with high usability. Future research will focus on collecting ground-truth labeled data to investigate system performance in real-world settings


Background

Dislocation is a common complication following total hip arthroplasty (THA), and accounts for a high percentage of subsequent revisions. The purpose of this study was to develop a convolutional neural network (CNN) model to identify patients at high risk for dislocation based on postoperative anteroposterior (AP) pelvis radiographs.

Methods

We retrospectively evaluated radiographs for a cohort of 13,970 primary THAs with 374 dislocations over 5 years of follow-up. Overall, 1,490 radiographs from dislocated and 91,094 from non-dislocated THAs were included in the analysis. A CNN object detection model (YOLO-V3) was trained to crop the images by centering on the femoral head. A ResNet18 classifier was trained to predict subsequent hip dislocation from the cropped imaging. The ResNet18 classifier was initialized with ImageNet weights and trained using FastAI (V1.0) running on PyTorch. The training was run for 15 epochs using ten-fold cross validation, data oversampling and augmentation.


Bone & Joint Open
Vol. 2, Issue 10 | Pages 879 - 885
20 Oct 2021
Oliveira e Carmo L van den Merkhof A Olczak J Gordon M Jutte PC Jaarsma RL IJpma FFA Doornberg JN Prijs J

Aims

The number of convolutional neural networks (CNN) available for fracture detection and classification is rapidly increasing. External validation of a CNN on a temporally separate (separated by time) or geographically separate (separated by location) dataset is crucial to assess generalizability of the CNN before application to clinical practice in other institutions. We aimed to answer the following questions: are current CNNs for fracture recognition externally valid?; which methods are applied for external validation (EV)?; and, what are reported performances of the EV sets compared to the internal validation (IV) sets of these CNNs?

Methods

The PubMed and Embase databases were systematically searched from January 2010 to October 2020 according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. The type of EV, characteristics of the external dataset, and diagnostic performance characteristics on the IV and EV datasets were collected and compared. Quality assessment was conducted using a seven-item checklist based on a modified Methodologic Index for NOn-Randomized Studies instrument (MINORS).


Bone & Joint Open
Vol. 3, Issue 11 | Pages 877 - 884
14 Nov 2022
Archer H Reine S Alshaikhsalama A Wells J Kohli A Vazquez L Hummer A DiFranco MD Ljuhar R Xi Y Chhabra A

Aims

Hip dysplasia (HD) leads to premature osteoarthritis. Timely detection and correction of HD has been shown to improve pain, functional status, and hip longevity. Several time-consuming radiological measurements are currently used to confirm HD. An artificial intelligence (AI) software named HIPPO automatically locates anatomical landmarks on anteroposterior pelvis radiographs and performs the needed measurements. The primary aim of this study was to assess the reliability of this tool as compared to multi-reader evaluation in clinically proven cases of adult HD. The secondary aims were to assess the time savings achieved and evaluate inter-reader assessment.

Methods

A consecutive preoperative sample of 130 HD patients (256 hips) was used. This cohort included 82.3% females (n = 107) and 17.7% males (n = 23) with median patient age of 28.6 years (interquartile range (IQR) 22.5 to 37.2). Three trained readers’ measurements were compared to AI outputs of lateral centre-edge angle (LCEA), caput-collum-diaphyseal (CCD) angle, pelvic obliquity, Tönnis angle, Sharp’s angle, and femoral head coverage. Intraclass correlation coefficients (ICC) and Bland-Altman analyses were obtained.


Orthopaedic Proceedings
Vol. 103-B, Issue SUPP_4 | Pages 77 - 77
1 Mar 2021
Ataei A Eggermont F Baars M Linden Y Rooy J Verdonschot N Tanck E
Full Access

Patients with advanced cancer can develop bone metastases in the femur which are often painful and increase the risk of pathological fracture. Accurate segmentation of bone metastases is, amongst others, important to improve patient-specific computer models which calculate fracture risk, and for radiotherapy planning to determine exact radiation fields. Deep learning algorithms have shown to be promising to improve segmentation accuracy for metastatic lesions, but require reliable segmentations as training input. The aim of this study was to investigate the inter- and intra-operator reliability of manual segmentation of femoral metastatic lesions and to define a set of lesions which can serve as a training dataset for deep learning algorithms. F. CT-scans of 60 advanced cancer patients with a femur affected with bone metastases (20 osteolytic, 20 osteoblastic and 20 mixed) were used in this study. Two operators were trained by an experienced radiologist and then segmented the metastatic lesions in all femurs twice with a four-week time interval. 3D and 2D Dice coefficients (DCs) were calculated to quantify the inter- and intra-operator reliability of the segmentations. We defined a DC>0.7 as good reliability, in line with a statistical image segmentation study. Mean first and second inter-operator 3D-DCs were 0.54 (±0.28) and 0.50 (±0.32), respectively. Mean intra-operator I and II 3D-DCs were 0.56 (±0.28) and 0.71 (±0.23), respectively. Larger lesions (>60 cm. 3. ) scored higher DCs in comparison with smaller lesions. This study reveals that manual segmentation of metastatic lesions is challenging and that the current manual segmentation approach resulted in dissatisfying outcomes, particularly for lesions with small volumes. However, segmentation of larger lesions resulted in a good inter- and intra-operator reliability. In addition, we were able to select 521 slices with good segmentation reliability that can be used to create a training dataset for deep learning algorithms. By using deep learning algorithms, we aim for more accurate automated lesion segmentations which might be used in computer modelling and radiotherapy planning


Bone & Joint Open
Vol. 5, Issue 2 | Pages 101 - 108
6 Feb 2024
Jang SJ Kunze KN Casey JC Steele JR Mayman DJ Jerabek SA Sculco PK Vigdorchik JM

Aims. Distal femoral resection in conventional total knee arthroplasty (TKA) utilizes an intramedullary guide to determine coronal alignment, commonly planned for 5° of valgus. However, a standard 5° resection angle may contribute to malalignment in patients with variability in the femoral anatomical and mechanical axis angle. The purpose of the study was to leverage deep learning (DL) to measure the femoral mechanical-anatomical axis angle (FMAA) in a heterogeneous cohort. Methods. Patients with full-limb radiographs from the Osteoarthritis Initiative were included. A DL workflow was created to measure the FMAA and validated against human measurements. To reflect potential intramedullary guide placement during manual TKA, two different FMAAs were calculated either using a line approximating the entire diaphyseal shaft, and a line connecting the apex of the femoral intercondylar sulcus to the centre of the diaphysis. The proportion of FMAAs outside a range of 5.0° (SD 2.0°) was calculated for both definitions, and FMAA was compared using univariate analyses across sex, BMI, knee alignment, and femur length. Results. The algorithm measured 1,078 radiographs at a rate of 12.6 s/image (2,156 unique measurements in 3.8 hours). There was no significant difference or bias between reader and algorithm measurements for the FMAA (p = 0.130 to 0.563). The FMAA was 6.3° (SD 1.0°; 25% outside range of 5.0° (SD 2.0°)) using definition one and 4.6° (SD 1.3°; 13% outside range of 5.0° (SD 2.0°)) using definition two. Differences between males and females were observed using definition two (males more valgus; p < 0.001). Conclusion. We developed a rapid and accurate DL tool to quantify the FMAA. Considerable variation with different measurement approaches for the FMAA supports that patient-specific anatomy and surgeon-dependent technique must be accounted for when correcting for the FMAA using an intramedullary guide. The angle between the mechanical and anatomical axes of the femur fell outside the range of 5.0° (SD 2.0°) for nearly a quarter of patients. Cite this article: Bone Jt Open 2024;5(2):101–108


Bone & Joint Open
Vol. 3, Issue 10 | Pages 767 - 776
5 Oct 2022
Jang SJ Kunze KN Brilliant ZR Henson M Mayman DJ Jerabek SA Vigdorchik JM Sculco PK

Aims. Accurate identification of the ankle joint centre is critical for estimating tibial coronal alignment in total knee arthroplasty (TKA). The purpose of the current study was to leverage artificial intelligence (AI) to determine the accuracy and effect of using different radiological anatomical landmarks to quantify mechanical alignment in relation to a traditionally defined radiological ankle centre. Methods. Patients with full-limb radiographs from the Osteoarthritis Initiative were included. A sub-cohort of 250 radiographs were annotated for landmarks relevant to knee alignment and used to train a deep learning (U-Net) workflow for angle calculation on the entire database. The radiological ankle centre was defined as the midpoint of the superior talus edge/tibial plafond. Knee alignment (hip-knee-ankle angle) was compared against 1) midpoint of the most prominent malleoli points, 2) midpoint of the soft-tissue overlying malleoli, and 3) midpoint of the soft-tissue sulcus above the malleoli. Results. A total of 932 bilateral full-limb radiographs (1,864 knees) were measured at a rate of 20.63 seconds/image. The knee alignment using the radiological ankle centre was accurate against ground truth radiologist measurements (inter-class correlation coefficient (ICC) = 0.99 (0.98 to 0.99)). Compared to the radiological ankle centre, the mean midpoint of the malleoli was 2.3 mm (SD 1.3) lateral and 5.2 mm (SD 2.4) distal, shifting alignment by 0.34. o. (SD 2.4. o. ) valgus, whereas the midpoint of the soft-tissue sulcus was 4.69 mm (SD 3.55) lateral and 32.4 mm (SD 12.4) proximal, shifting alignment by 0.65. o. (SD 0.55. o. ) valgus. On the intermalleolar line, measuring a point at 46% (SD 2%) of the intermalleolar width from the medial malleoli (2.38 mm medial adjustment from midpoint) resulted in knee alignment identical to using the radiological ankle centre. Conclusion. The current study leveraged AI to create a consistent and objective model that can estimate patient-specific adjustments necessary for optimal landmark usage in extramedullary and computer-guided navigation for tibial coronal alignment to match radiological planning. Cite this article: Bone Jt Open 2022;3(10):767–776


The Bone & Joint Journal
Vol. 106-B, Issue 11 | Pages 1216 - 1222
1 Nov 2024
Castagno S Gompels B Strangmark E Robertson-Waters E Birch M van der Schaar M McCaskie AW

Aims. Machine learning (ML), a branch of artificial intelligence that uses algorithms to learn from data and make predictions, offers a pathway towards more personalized and tailored surgical treatments. This approach is particularly relevant to prevalent joint diseases such as osteoarthritis (OA). In contrast to end-stage disease, where joint arthroplasty provides excellent results, early stages of OA currently lack effective therapies to halt or reverse progression. Accurate prediction of OA progression is crucial if timely interventions are to be developed, to enhance patient care and optimize the design of clinical trials. Methods. A systematic review was conducted in accordance with PRISMA guidelines. We searched MEDLINE and Embase on 5 May 2024 for studies utilizing ML to predict OA progression. Titles and abstracts were independently screened, followed by full-text reviews for studies that met the eligibility criteria. Key information was extracted and synthesized for analysis, including types of data (such as clinical, radiological, or biochemical), definitions of OA progression, ML algorithms, validation methods, and outcome measures. Results. Out of 1,160 studies initially identified, 39 were included. Most studies (85%) were published between 2020 and 2024, with 82% using publicly available datasets, primarily the Osteoarthritis Initiative. ML methods were predominantly supervised, with significant variability in the definitions of OA progression: most studies focused on structural changes (59%), while fewer addressed pain progression or both. Deep learning was used in 44% of studies, while automated ML was used in 5%. There was a lack of standardization in evaluation metrics and limited external validation. Interpretability was explored in 54% of studies, primarily using SHapley Additive exPlanations. Conclusion. Our systematic review demonstrates the feasibility of ML models in predicting OA progression, but also uncovers critical limitations that currently restrict their clinical applicability. Future priorities should include diversifying data sources, standardizing outcome measures, enforcing rigorous validation, and integrating more sophisticated algorithms. This paradigm shift from predictive modelling to actionable clinical tools has the potential to transform patient care and disease management in orthopaedic practice. Cite this article: Bone Joint J 2024;106-B(11):1216–1222


Orthopaedic Proceedings
Vol. 103-B, Issue SUPP_13 | Pages 125 - 125
1 Nov 2021
Sánchez G Cina A Giorgi P Schiro G Gueorguiev B Alini M Varga P Galbusera F Gallazzi E
Full Access

Introduction and Objective. Up to 30% of thoracolumbar (TL) fractures are missed in the emergency room. Failure to identify these fractures can result in neurological injuries up to 51% of the casesthis article aimed to clarify the incidence and risk factors of traumatic fractures in China. The China National Fracture Study (CNFS. Obtaining sagittal and anteroposterior radiographs of the TL spine are the first diagnostic step when suspecting a traumatic injury. In most cases, CT and/or MRI are needed to confirm the diagnosis. These are time and resource consuming. Thus, reliably detecting vertebral fractures in simple radiographic projections would have a significant impact. We aim to develop and validate a deep learning tool capable of detecting TL fractures on lateral radiographs of the spine. The clinical implementation of this tool is anticipated to reduce the rate of missed vertebral fractures in emergency rooms. Materials and Methods. We collected sagittal radiographs, CT and MRI scans of the TL spine of 362 patients exhibiting traumatic vertebral fractures. Cases were excluded when CT and/or MRI where not available. The reference standard was set by an expert group of three spine surgeons who conjointly annotated (fracture/no-fracture and AO Classification) the sagittal radiographs of 171 cases. CT and/or MRI were used confirm the presence and type of the fracture in all cases. 302 cropped vertebral images were labelled “fracture” and 328 “no fracture”. After augmentation, this dataset was then used to train, validate, and test deep learning classifiers based on the ResNet18 and VGG16 architectures. To ensure that the model's prediction was based on the correct identification of the fracture zone, an Activation Map analysis was conducted. Results. Vertebras T12 to L2 were the most frequently involved, accounting for 48% of the fractures. Accuracies of 88% and 84% were obtained with ResNet18 and VGG16 respectively. The sensitivity was 89% with both architectures but ResNet18 had a significantly higher specificity (88%) compared to VGG16 (79%). The fracture zone used was precisely identified in 81% of the heatmaps. Conclusions. Our AI model can accurately identify anomalies suggestive of TL vertebral fractures in sagittal radiographs precisely identifying the fracture zone within the vertebral body


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_7 | Pages 56 - 56
4 Apr 2023
Sun Y Zheng H Kong D Yin M Chen J Lin Y Ma X Tian Y Wang Y
Full Access

Using deep learning and image processing technology, a standardized automatic quantitative analysis systerm of lumbar disc degeneration based on T2MRI is proposed to help doctors evaluate the prognosis of intervertebral disc (IVD) degeneration. A semantic segmentation network BianqueNet with self-attention mechanism skip connection module and deep feature extraction module is proposed to achieve high-precision segmentation of intervertebral disc related areas. A quantitative method is proposed to calculate the signal intensity difference (SI) in IVD, average disc height (DH), disc height index (DHI), and disc height-to-diameter ratio (DHR). According to the correlation analysis results of the degeneration characteristic parameters of IVDs, 1051 MRI images from four hospitals were collected to establish the quantitative ranges for these IVD parameters in larger population around China. The average dice coefficients of the proposed segmentation network for vertebral bodies and intervertebral discs are 97.04% and 94.76%, respectively. The designed parameters of intervertebral disc degeneration have a significant negative correlation with the Modified Pfirrmann Grade. This procedure is suitable for different MRI centers and different resolution of lumbar spine T2MRI (ICC=.874~.958). Among them, the standard of intervertebral disc signal intensity degeneration has excellent reliability according to the modified Pfirrmann Grade (macroF1=90.63%~92.02%). we developed a fully automated deep learning-based lumbar spine segmentation network, which demonstrated strong versatility and high reliability to assist residents on IVD degeneration grading by means of IVD degeneration quantitation