The use of artificial intelligence (AI) is rapidly growing across many domains, of which the medical field is no exception. AI is an umbrella term defining the practical application of algorithms to generate useful output, without the need of human cognition. Owing to the expanding volume of patient information collected, known as ‘big data’, AI is showing promise as a useful tool in healthcare research and across all aspects of patient care pathways. Practical applications in orthopaedic surgery include: diagnostics, such as fracture recognition and tumour detection; predictive models of clinical and patient-reported outcome measures, such as calculating mortality rates and length of hospital stay; and real-time rehabilitation monitoring and surgical training. However, clinicians should remain cognizant of AI’s limitations, as the development of robust reporting and validation frameworks is of paramount importance to prevent avoidable errors and biases. The aim of this review article is to provide a comprehensive understanding of AI and its subfields, as well as to delineate its existing clinical applications in trauma and orthopaedic surgery. Furthermore, this narrative review expands upon the limitations of AI and future direction.

Article focus

  • Comprehensive review of artificial intelligence (AI) and its subfields, as well as existing applications and its role in orthopaedic surgery.

  • Critical presentation of validated and evolving AI research in orthopaedic surgery.

  • Limitations of AI and the need to establish robust validation and reporting frameworks.

Key messages

  • AI is showing promise as a useful tool in healthcare research and data science, including all aspects of patient care pathways.

  • Existing applications in orthopaedic surgery have shown promise in highlighting implant malposition; detecting features of loosening; predicting length of hospital stay, costs involved, functional outcomes, and prognostic scores; and implant identification in arthroplasty.

  • Clinicians should remain cognizant of AI’s limitations and proceed cautiously, until external validity is proven within acceptable margins of error.

Strengths and limitations

  • This is a comprehensive literature review of recent advances and AI applications in orthopaedic surgery, detailing areas lacking validated research.

  • Our study does not entail quantitative synthesis of outcomes with the use of AI.

Introduction: artificial intelligence, time for clear nomenclature

The application of artificial intelligence (AI) is rapidly growing across many domains, with the field of medicine being no exception. Traditionally AI is an umbrella term, originally theorizing the replication of human intellect via computers.1 The broad definition of AI is the practical application of complex algorithms to generate useful output, excluding the need for human cognitive intelligence.2,3 AI is becoming an integral part of modern society, ranging from air flight autopilot to fraud detection, social media advertisements, and the seemingly omniscient capabilities of ChatGPT.4 It is estimated that AI could cut annual USA healthcare costs by $150 billion by 2026.5,6 A considerable component of the cost reduction stems from adopting a proactive health management approach, expected to result in fewer hospitalizations, fewer doctor visits, and reduced treatments.5,6 This may be attributed to early detection of disease with known cures, through automating the review of large volumes of data using AI with advanced individualized risk profiling.7

Owing to the exponentially expanding volume of patient information collected, known as ‘big data’, AI is showing promise as a useful tool in the healthcare research armamentarium. Algorithms based on clinical data sets (including electronic medical records, genomic level data, validated clinical scores and imaging etc.) for predicting patients’ clinical outcomes are rigorously being explored. This concept encompasses data sets so large that there is no conceivable way that humans could comprehend such a plethora of information without the use of technology.3,8 While AI can categorize and make sense of big data, it is still only as good as the data provided and thus human contribution is paramount. The accuracy of its function can be progressively refined, as it has intriguingly been likened to successive human learning, whereby sequential exposure reinforces comprehension.2,3,9

While still in its infancy, the application of AI in the field of orthopaedics is a new frontier of data science. Orthopaedic surgery is already home to some of the most innovative technologies, such as robotic-assisted surgery, of which AI is an ever-growing part.10-18 Recently, the orthopaedic and wider healthcare literature have witnessed a surge in studies using AI, which on many occasions employ methodology not very different to traditional prediction models. To describe the application of AI models in orthopaedic surgery, it is necessary to delineate the concepts of each architectural design. The basis through which each model is created describes the level of complexity, power, and importantly the limitations. It is therefore of paramount importance to differentiate between different types of AI (Figure 1), in order to achieve consistency and ensure transparency for the readers. To accomplish that, enhanced understanding of AI and its subcategories is necessary, while abandoning a focus on the umbrella term AI could be considered when it comes to orthopaedic research.

The purpose of this narrative review article is to provide a comprehensive understanding of AI and its subfields, in addition to delineating its role in orthopaedic surgery and describing current existing applications. Furthermore, this review explores the current limitations and touches upon future direction.

Machine learning

AI encompasses a subfield called machine learning (ML) (Figure 1).19 ML can be described as harnessing the dimensions to ‘learn and adapt’ based on algorithms and input data, often surpassing human comprehension.20 Further subclassifications include supervised, unsupervised, and reinforcement ML. Supervised ML involves input data being labelled by humans and correcting the computer’s mistakes. For example, a computer is shown thousands of images of a normal radiograph (the computer recognizes all the peculiarities from pixels identified by human supervision) and then thousands of images of a broken bone. An AI algorithm dictates the recognition of what is labelled ‘broken’ or ‘not broken’. ML is only halfway complete and, following this process, the model must be refined or trained to validate accuracy before wider use. In the context of supervised ML, ‘ground truth’ data used to train the ML models are typically labelled or annotated by humans, who indicate the correct answer or outcome for a given input.19,20 Having high-quality, accurate ground truth data is crucial in validating ML models. Furthermore, when the model is trained to provide accurate ground truth, this allows for reuse of the model on new problems, which is referred to as ‘transfer learning’. In this way, the developer can avoid having to recreate entire new algorithms from new data sets, therefore saving huge amounts of time and money. Transfer learning has been applied to fracture recognition and osteoarthritis (OA) quantification, to name but a few.21

Unsupervised ML processes unlabelled training data, with a known outcome of interest, clustering them as known or unknown. In the aforementioned example, radiograph images of the same anatomical bone (e.g. a hip radiograph) allow the computer to ascertain what normal looks like. The computer then groups similar data and patterns them together (e.g. for radiographs of broken and unbroken bones), using an algorithm.22,23 The repetition of these functions can be used to fine-tune the algorithm and improve accuracy. This process (called an ‘epoch’) may be repeated up to a thousand times to achieve the accuracy required, before an algorithm can move beyond proof-of-concept and enter the validation phase. It is through this mechanism that a final algorithm can be established and applied to unknown data sets as required.8,19

Semi-supervised or reinforcement ML learns by exploration of the environment based on reward or punishment from certain actions (e.g. self-driving cars from Tesla).1,22,23 There is a growing body of research in deep reinforcement learning for the application of models in computer-assisted orthopaedic surgery, reporting capabilities of generating real-world, clinical-grade solutions without needing patient data for training.24 However, validated and reproducible high-quality tools are awaited.

Deep learning and neural networks

Deep learning (DL) is a more progressive and comprehensive subcategory of ML, comprising numerous, complex layers of algorithm, mirroring the neural networks seen in the brain through artificial neural networks (ANNs) (Figure 1).8,20,25,26 Where ML comprises thousands to millions of parameters, DL may have billions, with varying degrees and layers of complexity to broaden the function for which it is programmed. Akin to ML, much of DL requires human supervision to learn and be modified. However, there is an increasing research interest in DL models functioning without the need for human supervision. It operates with unlabelled and unstructured input, permitting the output of interest. An example of DL that is being explored in the world of orthopaedics is convolutional neural networks (CNNs), often purposed for imaging analysis and computer vision tasks.25,27

There has been a surge of research in CNNs pertaining to diagnostic and image recognition, classification and tumour detection, segmentation, and natural language processing (NLP), of which the field of orthopaedics is no exception. The mathematical architecture of CNNs may be thought of as an overlapping of grid patterns.25 The basic building blocks include convolutional layers (a combination of linear and non-linear operations used to extract image data), pooling layers (to reduce learnable parameters), and fully connected layers designed to automatically propagate image input and learn spatial hierarchies of features through a forward and backward propagation algorithm.25 For the most part, CNNs are made up of these three layers: the first two layers, convolution and pooling, perform feature extraction, whereas the third, a fully connected layer, maps the extracted features into final output, such as a classification.25

Natural language processing and electronic medical records

NLP describes computer comprehension of language.3 Functionally, it can scan clinical medical records and make sense of information such as operative notes and radiology reports. NLP algorithms have the potential to automate data collection for diagnostic elements, which could directly improve patient care and augment cohort surveillance.28 The implications of NLP may include aggregating and analyzing large databases abundant in information, usually too arduous for manual sorting, and reducing documentation time.3,29 It has already seen use in organizing relevant data from electronic medical health records during cases of periprosthetic fractures, and aiding the diagnosis of periprosthetic joint infections.28,30 For example, data elements that comprise the Musculoskeletal Infection Society (MSIS) criteria31 were manually extracted and used as the gold standard for validation. The NLP algorithm was applied to extract the presence of sinus tract, purulence, pathological documentation of inflammation, and growth of cultured organisms from medical records.

Image recognition and diagnostics

Fracture recognition

The past decade has seen imaging analysis become a considerable focal point among AI research.32 Numerous authors have studied the ability of CNNs to identify various upper and lower limb fractures on radiographs, such as hip, calcaneus, and radial fractures. Accuracy of up to 98% has been reported, as well as the potential of CNNs to outperform or perform non-inferiorly to humans.26,32-36 It has also been reported that DL models could recognize laterality, exam view, and body part in wrist, hand, and ankle radiographs. They have also shown promise in achieving more notoriously difficult diagnoses, such as scaphoid fractures, as effectively as human specialists.20,30,35,37 It should not be forgotten that the performance of CNNs needs to be validated both internally and externally before clinical adoption. While internal validation is proving quite successful, there are hurdles that make external validation difficult, e.g. for fracture classification. One reported issue is relating to different institutions using different labelling systems for radiographs or radiation dosages. In particular, if a given institution changes their protocols, previously validated algorithms may become invalid and problematic to translate.34

A recent paper by Oliveira E Carmo et al38 highlights the lack of external validity of CNNs for fracture detection in the literature. In a large systematic review, only four studies (11% of total studies identified) were found to show external validity, both temporal and geographical, beyond one hospital site. The authors recommend the use of standardized reporting guidelines in order to ascertain ground truth for CNNs in fracture recognition, such as the Clinical Artificial Intelligence Research (CAIR) checklist,39 to critically appraise performance of CNNs to facilitate eventual implementation into clinical practice.38

Tumour detection

The potential of AI has been shown to extend beyond fracture recognition. Park et al40 showed that a CNN was able to eclipse the accurate detection of proximal femur bone tumours compared to clinicians.40 Specifically, ML may prove useful for the diagnosis of more ambiguous primary bone and soft-tissue tumours, ones that are not clearly evident on plain radiographs. These applications have also been proposed to help predict patients’ prognosis, such as those with synovial sarcoma.20,23,41

Other diagnostic applications

AI has also shown promising results in several other diagnostic applications, ranging from developmental abnormalities to soft-tissue knee injuries. A proof-of-concept investigation by Xie et al42 tested a CNN-based algorithm to improve the quality of MRI scans in tibial plateau fractures with combined meniscal defects.43 The authors documented a sensitivity of 96.9%, specificity of 93.2%, and accuracy of 95.3%, respectively, when MRI diagnostics were compared with arthroscopic findings. The clearer, enhanced AI imaging produced by the CNN model led to a diagnosis that was consistent with intraoperative findings. This study is one of many that highlights feasible grounds for future research and advancements for current imaging modalities.42 Regarding congenital abnormalities, such as hip dysplasia, studies have also shown practicalities for radiological measurements in a quick and effective manner.44 AI-assisted diagnosis and classification of OA from radiographs have demonstrated similar accuracy to senior clinicians.20 Furthermore, CNNs for osteoporosis fracture recognition have been developed to directly evaluate bone mineral density from radiographs.45,46

AI image recognition may soon be a highly sought-after application in orthopaedics, corroborated in a study by Jang et al47 where CNNs were reported to identify bone and soft-tissue landmarks as objects on radiographs. Additionally, more accurate calculations using the DL model for knee alignment may provide the potential for preoperative planning in total knee arthroplasty (TKA).47 However, several limitations such as the established ground truths, radiograph quality, alignment, or rotation indicate the variability and, as such, these methods are not yet employed in preoperative planning for TKA.47

A recent scoping review by Gurung et al48 investigated the application of AI in analyzing postoperative radiographs following total hip arthroplasty (THA) and TKA to ensure adequate implant positioning, and reported > 90% accuracy. While the 12 individual studies were large, using up to 320,000 radiographs, their robustness was a point of contention. The authors concluded that there is currently insufficient evidence to use AI for said purposes in clinical practice.48

Automated identification of arthroplasty implants using DL has been reported to be a useful augment in revision surgery, enabling accurate planning of the operative technique and necessary extraction equipment.8,25,34,49 A study by Borjali et al49 assessed a novel, highly accurate, and fully automatic approach identifying the design of THA prosthesis from plain radiographs. An AI model able to identify prosthesis within milliseconds, versus 20 to 30 minutes, can have huge implications for patient safety.49 Furthermore, it has been shown that in 10% of cases, surgeons are unable to identify the prosthesis preoperatively and 2% intraoperatively.49 This has been shown to be associated with increased operating time, blood/bone loss, recovery time, and healthcare costs.49 A sensitivity up to 94% and specificity of 97% in identifying implant loosening following hip and knee arthroplasty using CNNs has been reported.20 Of note, the CNN algorithm outperformed the human counterpart from plain radiographs, illustrating its potential role in preventing serious complications and redistributing clinical time to improve patient care.20,50

Predictive algorithms

Recent literature has showcased the predictive value of AI models to calculate mortality rates, transfusion risk, and length of hospital stay following elective arthroplasty.8,34,51-53 This could be of particular benefit when considering patient care pathways, from preoperative optimization to recovery plans and resource allocation.25 It has also been reported that DL/ML models could predict, up to a decade in advance, knee and hip OA by means of bone texture analysis on the proximal femur and acetabulum, and clinical risk factors, with acceptable accuracy.8,21,54 Conceptually, this could act as a risk stratification tool, identifying individuals in need of early intervention.25 A recent study comparing a conventional ML, ANN model with traditional logistic regression of 28,742 patients from the National Surgery Quality Improvement Programme (USA) has demonstrated similar predictability of clinically important factors for safe same-day discharge post TKA using the ANN model.51

Multiple AI predictive models assimilating large amounts of patient data to improve healthcare outcomes have been described. Examples in orthopaedic surgery include AI models predicting suitable patients for nerve blocks following anterior cruciate ligament (ACL) reconstruction.32 Kim et al55 developed a DL algorithm to predict the mortality and morbidity risk following spinal fusion, and found this to be more accurate compared to the traditionally used scoring system by the American Society of Anesthesiologists.56 Another interesting application of AI is showcased by Kumar et al,57 who developed a ML algorithm predicting patient outcomes in shoulder arthroplasty. The input comprises shoulder range of motion, demographic data, American Shoulder and Elbow Surgeons (ASES) scores,58 and visual analogue scale (VAS) pain scores, to assess prognosis and range of motion up to seven years post-treatment, with up to 82% reported accuracy.8,57,59 A recent study involving a total of 111,147 patients undergoing primary shoulder arthroplasty reported 73.1% to 91.8% accuracy using ANN in predicting length of stay, hospital costs, and discharge disposition for both chronic/degenerative and acute/traumatic conditions.60 From a recent retrospective multicentre analysis of nearly 2,000 patients following total shoulder arthroplasty, a model to predict two-year ASES scores has been developed and validated. The model was reported to be accurate within the minimal clinically important difference in 85% of patients.22

The role of AI in surgical training

AI could play a pivotal role in orthopaedic surgical training, where repetition and the existence of a training framework are imperative to acquiring competence.61 Through ML and computer vision, AI now has the capacity to gather data and provide meaningful, personalized feedback on surgical abilities. Lavanchy et al62 created a ML algorithm capable of assessing the skill of laparoscopic cholecystectomies, which demonstrated 87% accuracy in identifying the kinematics of surgical instruments as a surrogate measure of efficiency.14-16,61 This provided constructive feedback to the operator and represents a system that could feasibly be translated into orthopaedics.62 The integration of AI systems (such as the Virtual Operative Assistant) into virtual reality (VR) and augmented reality (AR) can help to attain objective critique without depending on the typical ‘apprenticeship’ learning modality.61 Siemionow et al63 provided an example of successful AI incorporation into AR. The researchers developed a ML system enabling the overlay of a 3D spinal image onto cadavers, facilitating accurate metal probe placement into lumbar vertebrae.63 The overarching advantage of these technologies is patient safety, given surgical trainees can acquire experience while mitigating risk to patients.

Rehabilitation and postoperative care

The postoperative phase has been highlighted as a key area of AI interest.64 A growing body of studies have reported the use of smartphones to gather continuous, remote data on a patient’s vitals and rehabilitation progress following TKA.3,8 ML-based algorithms allow tracking of physiotherapy engagement and exercise participation, and can alert healthcare professionals if patient milestones are not met.65 Similarly, the surveillance of patients’ vitals, wellbeing, and complications, such as deep vein thrombosis, has been documented extensively in the literature.64,66,67 These AI features have been documented to reduce readmission rates following TKA and THA. However, no statistically significant difference in the rate of hospital discharge without remote monitoring has been reported.68,69 Interestingly, it has also been proposed that ML algorithms could prove a useful augment in rehabilitation following ACL surgery, by using biomechanical data to assess for asymmetries in gait analysis.32,70

DL has been touted by multiple studies to be capable of predicting the risk of complications leading to revision surgery, using postoperative hip arthroplasty radiographs. Rouzrokh et al71 found that a DL algorithm trained on over 90,000 postoperative images predicted implant dislocation within five years of surgery. This model had a rather high negative predictive value. However, this may still provide a useful ‘ruling out’ method for high-risk patients, and demonstrates the potential role of AI in guiding pre-emptive interventions.8,30,71

Limitations to AI in orthopaedics

AI is associated with considerable capital costs and financial burden on healthcare systems, potentially impeding its widespread adoption.1,65 Notwithstanding this, carefully designed cost-benefit analyses could delineate whether its utility in orthopaedics results in cost-effective interventions.72-74 The risk of breaching patient confidentiality is inherent with large data sets, and therefore should be treated as a prominent ethical consideration.1,3 As with any research being generalized to the wider clinical setting, AI models must go through a rigorous process of validation. Norgeot et al75 proposed a minimum set of documentation to bring similar levels of transparency and utility to the application of AI in medicine and surgery: minimum information about clinical AI modelling (MI-CLAIM). These guidelines involve six areas that require attention when appraising CNN models, and aim to inform clinical adoption of AI models: 1) study design; 2) separation of data into partitions for model training and model testing; 3) optimization and final model selection; 4) performance evaluation; 5) model examination; and 6) reproducible pipeline.75

The application of AI models outside of the data or institution of which it is designed (external validity) should be carefully considered. Systematic errors within algorithms could lead to negligent and widespread implications for patients. Accordingly, a systematic approach to designing and validating models using proven concepts is required to avoid such errors before translation into clinical practice. To mitigate this risk, AI is intended as an adjunct to the clinical decision-making process, not a substitute. Clinicians should remain cognisant of AI’s limitations and proceed cautiously, until external validity is proven within acceptable margins of error.3,37

AI is as good as its data, and the development of robust reporting frameworks is vital to preventing avoidable errors.70,76-78 Guidelines for establishing models are necessary, such as the Transparent Reporting of a multivariate prediction model for Individual Prognosis or Diagnosis (TRIPOD) initiative,77 which has already been used in validating ML in orthopaedics. The complexity of CNNs usually depends upon the complexity of the input data. The more convoluted the input data are, the more comprehensive mathematical algorithms are required to deliver the desired output. An inherent problem described repeatedly in the literature is the generation of complex CNNs, solely reflecting the data they are set up to evaluate.70 Overfitting refers to a model that fails generalizability well after model training, and is particularly common with models that are nonparametric/nonlinear and have more flexibility when learning the target function. The lack of generalizability may be attributed to the model learning the random fluctuation details and noise as part of the training data. Therefore, when translated to external data sets the model is unable to recognize the new patterns as efficiently. Furthermore, this issue occurs when the model mirrors and focuses entirely on minor characteristics present in the training data set instead of perceiving more generalized patterns beneath the data, and therefore requires continuous learning with larger volumes of data.34,77,78 It is vital for clinicians to be aware of this risk, and a collective effort is needed from multiple stakeholders to ensure appropriate collection, curation, and annotation of data that are validated beyond a given institution.70

Conclusion and future considerations

The use of AI in orthopaedics bears the potential to improve patient outcomes and reduce the workload of healthcare professionals. An auspicious future development is the innovative ‘digital twin’ pertaining to a virtual representation of oneself. This is thought to be at the cornerstone of precision medicine, able to predict diseases, treatment outcomes, and preventive interventions tailored to the individual patient phenotype, even down to the genome level. The effect this could have on the evolution of orthopaedic surgery and medicine is almost incomprehensible. AI in orthopaedic surgery shows promise in identifying hip and knee implants, highlighting implant malposition, detecting features of loosening, and predicting length of hospital stay, costs involved, functional outcomes, and prognostic scores. The current state of AI technology requires a coordinated effort to effectively progress from proof-of-concept into clinical practice. In this vein, the establishment of systematic and robust validation and reporting frameworks is of utmost importance to allow a safe adoption of this technology.

