Advertisement for orthosearch.org.uk
You currently have no access to view or download this content. Please log in with your institutional or personal account if you should have access to through either of these
The Bone & Joint Journal Logo

Receive monthly Table of Contents alerts from The Bone & Joint Journal

Comprehensive article alerts can be set up and managed through your account settings

View my account settings

Trauma

Detection, classification, and characterization of proximal humerus fractures on plain radiographs

do convolutional neural networks still outperform humans when the task becomes increasingly complex?



Download PDF

Abstract

Aims

The purpose of this study was to develop a convolutional neural network (CNN) for fracture detection, classification, and identification of greater tuberosity displacement ≥ 1 cm, neck-shaft angle (NSA) ≤ 100°, shaft translation, and articular fracture involvement, on plain radiographs.

Methods

The CNN was trained and tested on radiographs sourced from 11 hospitals in Australia and externally validated on radiographs from the Netherlands. Each radiograph was paired with corresponding CT scans to serve as the reference standard based on dual independent evaluation by trained researchers and attending orthopaedic surgeons. Presence of a fracture, classification (non- to minimally displaced; two-part, multipart, and glenohumeral dislocation), and four characteristics were determined on 2D and 3D CT scans and subsequently allocated to each series of radiographs. Fracture characteristics included greater tuberosity displacement ≥ 1 cm, NSA ≤ 100°, shaft translation (0% to < 75%, 75% to 95%, > 95%), and the extent of articular involvement (0% to < 15%, 15% to 35%, or > 35%).

Results

For detection and classification, the algorithm was trained on 1,709 radiographs (n = 803), tested on 567 radiographs (n = 244), and subsequently externally validated on 535 radiographs (n = 227). For characterization, healthy shoulders and glenohumeral dislocation were excluded. The overall accuracy for fracture detection was 94% (area under the receiver operating characteristic curve (AUC) = 0.98) and for classification 78% (AUC 0.68 to 0.93). Accuracy to detect greater tuberosity fracture displacement ≥ 1 cm was 35.0% (AUC 0.57). The CNN did not recognize NSAs ≤ 100° (AUC 0.42), nor fractures with ≥ 75% shaft translation (AUC 0.51 to 0.53), or with ≥ 15% articular involvement (AUC 0.48 to 0.49). For all objectives, the model’s performance on the external dataset showed similar accuracy levels.

Conclusion

CNNs proficiently rule out proximal humerus fractures on plain radiographs. Despite rigorous training methodology based on CT imaging with multi-rater consensus to serve as the reference standard, artificial intelligence-driven classification is insufficient for clinical implementation. The CNN exhibited poor diagnostic ability to detect greater tuberosity displacement ≥ 1 cm and failed to identify NSAs ≤ 100°, shaft translations, or articular fractures.

Cite this article: Bone Joint J 2024;106-B(11):1348–1360.


Correspondence should be sent to Reinier Willem Alfred Spek. E-mail:

W. J. Smith and M. Sverdlov are joint senior authors.


For access options please click here