Abstract
Aims
The aim of this study was to develop and evaluate a deep learning-based model for classification of hip fractures to enhance diagnostic accuracy.
Methods
A retrospective study used 5,168 hip anteroposterior radiographs, with 4,493 radiographs from two institutes (internal dataset) for training and 675 radiographs from another institute for validation. A convolutional neural network (CNN)-based classification model was trained on four types of hip fractures (Displaced, Valgus-impacted, Stable, and Unstable), using DAMO-YOLO for data processing and augmentation. The model’s accuracy, sensitivity, specificity, Intersection over Union (IoU), and Dice coefficient were evaluated. Orthopaedic surgeons’ diagnoses served as the reference standard, with comparisons made before and after artificial intelligence assistance.
Results
The accuracy, sensitivity, specificity, IoU, and Dice coefficients of the model for the four fracture categories in the internal dataset were as follows: Displaced (1.0, 0.79, 1.0, 0.70, 0.82), Valgus-impacted (1.0, 0.80, 1.0, 0.70, 0.82), Stable (0.99, 0.95, 0.99, 0.83, 0.89), and Unstable (1.0, 0.98, 0.99, 0.86, 0.92), respectively. For the external validation dataset, the sensitivity and specificity were as follows: Displaced (0.83, 0.94), Valgus-impacted (0.89, 0.90), Stable (0.88, 0.95), and Unstable (0.85, 0.99), respectively. The overall means (Micro AVG and Macro AVG) for the external dataset were Micro AVG (0.83 (SD 0.05), 0.96 (SD 0.01)) and Macro AVG (0.69 (SD 0.02), 0.95 (SD 0.02)), respectively.
Conclusion
Compared to human diagnosis alone, our study demonstrates that the developed model significantly improves the accuracy of detecting and classifying hip fractures. Our model has shown great potential in assisting clinicians with the accurate diagnosis and classification of hip fractures.
Cite this article: Bone Joint J 2025;107-B(2):213–220.