A systematic review of natural language processing applications in Trauma & Orthopaedics

Luke Farrow; Arslan Raja; Mingjun Zhong; Lesley Anderson

doi:10.1302/2633-1462.63.BJO-2024-0081.R1

Current issue

Systematic Review

A systematic review of natural language processing applications in Trauma & Orthopaedics

Luke Farrow
Arslan Raja
Mingjun Zhong
Lesley Anderson

Download PDF

A systematic review of natural language processing applications in Trauma & Orthopaedics

Farrow L, Raja A, Zhong M, Anderson L. A systematic review of natural language processing applications in Trauma & Orthopaedics. Bone Jt Open. 2025;6(3):264-274. doi:10.1302/2633-1462.63.BJO-2024-0081.R1

Farrow, Luke, et al. “A systematic review of natural language processing applications in Trauma & Orthopaedics.” Bone & Joint Open, vol. 6, no. 3, 2025, pp. 264-274., https://doi.org/10.1302/2633-1462.63.BJO-2024-0081.R1

Farrow, L., Raja, A., Zhong, M., & Anderson, L. (2025). A systematic review of natural language processing applications in Trauma & Orthopaedics. Bone & Joint Open, 6(3), 264-274. https://doi.org/10.1302/2633-1462.63.BJO-2024-0081.R1

Farrow, L., Raja, A., Zhong, M. and Anderson, L. (2025) “A systematic review of natural language processing applications in Trauma & Orthopaedics.” Bone & Joint Open, 6(3), pp. 264-274. Available at: https://doi.org/10.1302/2633-1462.63.BJO-2024-0081.R1

Farrow, Luke, Arslan Raja, Mingjun Zhong, and Lesley Anderson. “A systematic review of natural language processing applications in Trauma & Orthopaedics.” Bone & Joint Open 6, no. 3 (2025): 264-274. https://doi.org/10.1302/2633-1462.63.BJO-2024-0081.R1

Farrow L, Raja A, Zhong M, Anderson L. A systematic review of natural language processing applications in Trauma & Orthopaedics. Bone Jt Open. 2025 Mar 5;6(3):264-274. https://doi.org/10.1302/2633-1462.63.BJO-2024-0081.R1

Copy to clipboard

Mendeley

BibTeX

EndNote

RIS

Abstract

Aims

Prevalence of artificial intelligence (AI) algorithms within the Trauma & Orthopaedics (T&O) literature has greatly increased over the last ten years. One increasingly explored aspect of AI is the automated interpretation of free-text data often prevalent in electronic medical records (known as natural language processing (NLP)). We set out to review the current evidence for applications of NLP methodology in T&O, including assessment of study design and reporting.

Methods

MEDLINE, Allied and Complementary Medicine (AMED), Excerpta Medica Database (EMBASE), and Cochrane Central Register of Controlled Trials (CENTRAL) were screened for studies pertaining to NLP in T&O from database inception to 31 December 2023. An additional grey literature search was performed. NLP quality assessment followed the criteria outlined by Farrow et al in 2021 with two independent reviewers (classification as absent, incomplete, or complete). Reporting was performed according to the Synthesis-Without Meta-Analysis (SWiM) guidelines. The review protocol was registered on the Prospective Register of Systematic Reviews (PROSPERO; registration no. CRD42022291714).

Results

The final review included 31 articles (published between 2012 and 2021). The most common subspeciality areas included trauma, arthroplasty, and spine; 13% (4/31) related to online reviews/social media, 42% (13/31) to clinical notes/operation notes, 42% (13/31) to radiology reports, and 3% (1/31) to systematic review. According to the reporting criteria, 16% (5/31) were considered good quality, 74% (23/31) average quality, and 6% (2/31) poor quality. The most commonly absent reporting criteria were evaluation of missing data (26/31), sample size calculation (31/31), and external validation of the study results (29/31 papers). Code and data availability were also poorly documented in most studies.

Conclusion

Application of NLP is becoming increasingly common in T&O; however, published article quality is mixed, with few high-quality studies. There are key consistent deficiencies in published work relating to NLP which ultimately influence the potential for clinical application. Open science is an important part of research transparency that should be encouraged in NLP algorithm development and reporting.

Cite this article: Bone Jt Open 2025;6(3):264–274.

Take home message

This study highlights some of the key potential uses of natural language processing in Trauma & Orthopaedics.

It also identifies some methodological concerns with the currently available literature on the subject.

Introduction

There has been a massive influx of publications regarding artificial intelligence (AI) applications in the domain of Trauma & Orthopaedics (T&O).¹ One AI technique is natural language processing (NLP), which enables processing and analysis of large amounts of natural language or 'free-text' (for example, written information contained within a clinical letter) data.

It is estimated that approximately 80% of healthcare data are in an unstructured or 'free-text' format.² These data have the potential to provide a veritable wealth of useful information to guide clinical practice and research. NLP allows users to turn these unstructured data into meaningful material for analysis.

NLP is not without its challenges, in particular risk concerning potential identification of protected healthcare information contained within the free-text resource. Techniques such as 'Hidden In Plain Sight' (HIPS) methods have been developed to attempt to maintain free-text structure while ensuring anonymity,³ but this in itself requires dedicated health data science infrastructure. Ethical concerns have also been raised about granting access to large volumes of anonymized free-text healthcare data without consent, although previous evidence has suggested that this is supported if particular safeguarding structures are in place.⁴

Despite these challenges, there has been evidence of successful use of NLP applications within the healthcare setting. Examples include delirium detection in the intensive care unit,⁵ surveillance of patients at high risk of upper gastrointestinal cancer,⁶ and predicting outcomes of critical care patients.⁷

Development of NLP applications has been reported within T&O, such as development of an arthroplasty database⁸ and fracture identification.⁹ No study to date has, however, methodically assessed the available NLP literature, including an evaluation of study quality and analysis of reported performance metrics. We therefore set out to perform a systematic review of NLP applications within T&O to better appraise current applications and guide future use.

Methods

This systematic review was performed and reported according to the PRISMA statement.¹⁰ Registration prior to study commencement was undertaken on the Prospective Register of Systematic Reviews (PROSPERO) no. CRD42022291714.

Search strategy

Relevant articles were identified through a search of MEDLINE, Allied and Complementary Medicine Database (AMED), Excerpta Medica Database (EMBASE), and Cochrane Central Register of Controlled Trials (CENTRAL). An additional search of the grey literature was also undertaken using OrthoSearch (an orthopaedic-specific database which contains abstracts, articles, and associated media information).¹¹ All electronic searches were undertaken from database inception to 31 December 2023. Full electronic search terms are shown in Supplementary Table i. Reference lists from all extracted studies were reviewed for potentially eligible manuscripts.

Eligibility criteria

All studies that involved research related to the use of NLP in the setting of T&O and associated subspecialities were included. Exclusion criteria were studies involving other surgical or medical specialities, use of other AI techniques that were not specifically identified as NLP, publications in relation to generative AI, and non-English language publications.

Study identification

Two assessors (LF, AR) independently screened search output titles and abstracts for articles which met the eligibility criteria. Full-text review was undertaken to determine eligibility.

Data extraction

Data extraction was undertaken using a prespecified proforma by two independent assessors (LF, AR). Fields included: 1) Design overview: author, year, subspeciality, and NLP domain (e.g. online reviews/social media, or clinical/operation notes); 2) Introduction reporting: study aims; 3) Methods reporting: data source, data quality, data pre-processing, missing data, testing/training/internal validation, external validation, model type, and sample size calculation; 4) Results reporting: sample reporting, performance metrics, model evaluation, and model explanation; 5) Conclusions reporting: clinical practice interpretation, limitations, and future research; and 6) Open science: code and data availability.

Quality assessment

To our knowledge, there are no current globally defined reporting guidelines that relate specifically to NLP. We therefore used assessment of compliance to the reporting guidelines outlined by Farrow et al,¹ with each domain categorized as either complete, incomplete, or absent. Code and data availability were assessed separately. The reporting guidelines were chosen due to their specific relation to AI applications in T&O, with inclusion of reporting quality across several domains for the introduction, methods, results, and conclusions separately. An overall cumulative score (total/34) was derived from score tertiles to allow for better interpretability of the final score. Manuscripts with scores < 11, 11 to 22, and 23 to 34 were deemed poor, average, and good quality, respectively. Any disagreement regarding individual scoring of domains between data extractors was resolved by discussion.

Pooled performance metrics

Where feasible, according to study reporting, pooled performance metrics (mean and range) were assessed. This included: model accuracy; sensitivity (recall); specificity; precision (positive predictive value); area under the receiver operating curve (AUROC); F1 score; and calibration. Where scores for multiple cohorts were reported the highest performing model output was chosen for inclusion. All scores were defined by the individual study authors, including decisions around ground-truth labels.

Statistical analysis

Given the nature of the included data, meta-analysis was not feasible and therefore reporting was performed according to the Synthesis Without Meta-Analysis (SWiM) criteria.¹² Studies have been grouped by NLP domain, with assessment of study heterogeneity and evidence certainty determined by the variability of study validity/bias within each domain.

Results

Research results

Using the pre-specified search criteria, 602 potentially eligible records were included. Following full-text assessment, 36 manuscripts were included.^8,9,13-46Figure 1 depicts the flow diagram of the full search process. The number of articles published per year increased from one between 2012 to 2017 to a peak of 13 in 2021 alone.

Fig. 1

Study selection process. NLP, natural language processing.

Characteristics of included studies

Study characteristics, incorporating the quality assessment scoring for each manuscript, are detailed in Table I.

Table I.

Summary of included studies including reporting assessment.

Design overview				Introduction reporting	Methods reporting								Results reporting			Conclusions reporting			Open science
first author	Year	Sub-speciality	NLP domain	Study aims	Data source	Data quality	Data pre-processing	Missing data	Test, train, and validation methods	External validation	ML Output	Sample size calculation	Sample reporting	Model evaluation	Model explanation	Clinical practice interpretation	Limitations	Future research	Code availability	Data availability	Overall (/34)
Shah	2020	Arthroplasty	Clinical notes/operation notes	1	2	0	2	0	2	0	2	0	0	2	0	2	2	2	0	0	17
Mohammadi	2020	Arthroplasty	Clinical notes/operation notes	2	2	0	2	1	2	0	2	0	2	2	0	2	2	2	0	1	21
Blaker	2021	Trauma	Clinical notes/operation notes	2	2	2	1	1	1	0	1	2	1	1	0	2	1	0	0	0	15
Karhade	2020	Spine	Clinical notes/operation notes	2	2	1	2	0	2	0	2	0	1	2	2	2	2	1	0	0	21
Karhade	2020	Spine	Clinical notes/operation notes	2	2	2	2	1	2	0	1	0	2	2	2	1	2	2	0	0	23
Sagheb	2021	Arthroplasty	Clinical notes/operation notes	1	2	1	2	0	1	0	1	0	2	1	2	1	2	1	0	0	17
Wyles	2019	Arthroplasty	Clinical notes/operation notes	2	2	2	2	0	0	2	1	0	2	1	1	2	2	1	2	0	22
Tibbo	2019	Trauma	Clinical notes/operation notes	2	2	2	2	0	1	0	2	0	1	1	0	2	2	1	0	0	18
Fu	2021	Arthroplasty	Clinical notes/operation notes	2	2	2	2	0	2	0	2	0	1	2	0	2	2	1	0	0	20
Karhade	2021	Spine	Clinical notes/operation notes	2	2	0	2	0	0	0	1	0	1	2	2	2	2	2	0	0	17
Thirukumaran	2019	General	Clinical notes/operation notes	2	2	2	2	0	2	0	1	0	2	2	2	2	2	2	0	0	23
Borjali	2021	Arthroplasty	Clinical notes/operation notes	2	2	2	2	0	2	0	2	0	0	1	1	2	2	1	0	0	19
Karhade	2020	Spine	Clinical notes/operation notes	2	2	2	2	0	2	0	1	0	2	2	2	2	2	1	0	0	22
Wyles	2022	Arthroplasty	Clinical notes/operation notes	1	2	2	2	1	1	2	2	0	1	1	0	2	2	1	0	0	20
Karhade	2022	Spine	Clinical notes/operation notes	1	2	2	2	0	2	2	2	0	1	2	2	2	2	1	0	0	23
Flores-Balado	2023	Arthroplasty	Clinical notes/Operation notes	1	2	1	1	0	1	1	2	0	1	1	2	1	1	0	0	0	15
Tavabi	2022	Sports	Clinical notes/operation notes	1	2	1	1	1	2	1	2	0	0	1	1	1	1	1	0	0	16
Kita	2022	Arthroplasty	Clinical notes/operation notes	2	2	1	2	0	2	1	1	0	1	1	1	1	2	1	0	0	18
Langerhuizen	2021	General	Online reviews/social media	2	1	0	2	1	1	0	1	0	1	0	0	1	1	0	0	2	13
Bovonratwet	2021	Arthroplasty	Online reviews/social Media	2	2	1	1	0	0	0	1	0	2	1	0	2	1	0	0	0	13
Menendez	2019	Shoulder and Elbow	Online reviews/social media	2	2	0	0	0	0	0	1	0	2	0	0	2	1	0	0	0	10
Dominy	2021	Spine	Online reviews/social media	2	1	0	1	0	0	0	1	0	1	1	0	2	1	0	0	0	10
Groot	2020	Tumour	Radiology reports for feature detection/classification	1	2	0	2	2	2	0	2	0	2	2	2	2	2	2	0	0	23
dos Santos	2019	Foot and Ankle	Radiology reports for feature detection/classification	2	2	0	1	0	0	0	1	0	2	2	0	2	2	0	0	1	16
Wang	2018	Trauma	Radiology reports for feature detection/classification	1	2	2	2	0	2	0	1	0	0	2	0	1	1	1	0	0	16
Wagholikar	2013	Trauma	Radiology reports for feature detection/classification	2	1	2	1	0	0	0	2	0	0	2	2	2	1	2	0	0	17
Grundmeier	2016	Trauma	Radiology reports for feature detection/classification	2	2	2	2	0	2	0	2	0	1	2	2	2	2	1	0	0	22
Do	2012	Trauma	Radiology reports for feature detection/classification	2	1	0	2	0	1	0	1	0	0	2	0	2	2	2	0	0	15
Kolanu	2021	Trauma	Radiology reports for feature detection/classification	2	2	1	2	0	1	2	2	0	1	2	1	2	2	1	0	0	21
Olthof	2021	Trauma	Radiology reports for feature detection/classification	2	2	2	2	0	2	0	2	0	1	2	2	1	1	0	0	0	19
Foufi	2018	Trauma	Radiology reports for feature detection/classification	1	1	1	1	0	0	0	1	0	0	1	0	1	0	1	0	0	8
Galbusera	2021	Spine	Radiology reports for feature detection/classification	2	2	2	2	0	1	0	2	0	1	2	1	1	2	0	0	0	18
Jungman	2021	Trauma	Radiology reports for feature detection/classification	2	2	2	2	0	1	0	2	0	2	2	0	2	2	0	0	0	19
Tan	2018	Spine	Radiology reports for feature detection/classification	1	2	2	2	0	2	0	2	0	1	2	2	2	2	1	0	0	21
Huhdanpaa	2017	Spine	Radiology reports for feature detection/classification	2	2	2	1	0	2	0	2	0	2	2	1	1	1	0	0	0	14
Buchlak	2021	Arthroplasty	Systematic reviews	2	2	1	2	0	0	0	2	0	2	1	2	1	2	2	1	2	23

For the reporting assessment: 0 indicates domain absent from the manuscript, 1 indicates partial completion, and 2 indicates full completion. Total score across 17 domains = 34 points. Reporting criteria adapted from Farrow et al.¹

Of the included articles, ten related to trauma, ten to arthroplasty, nine to spinal surgery, three to general orthopaedics, one to foot and ankle surgery, one to shoulder and elbow surgery, one to sports surgery, and one to tumour surgery.

With regards to NLP domains, the most commonly used were clinical or operation notes (50%) and radiology reports (36% each). Use in assessment of online reviews/social media and systematic reviews were less common (11% and 3%, respectively).

Overall assessment of study reporting

Of the 36 included studies, the median quality score was 18/34 (IQR 16 to 21); 11% were categorized as good quality, 83% average quality, and 6% poor quality. The most common incomplete study reporting fields were evaluation of missing data, external validation, and a sample size calculation. The top three most frequently completed reporting criteria were study aims, data source, and data pre-processing. Figure 2 demonstrates the bar plot of overall study reporting outcomes.

Fig. 2

Summary of overall results.

Reporting domains

Full details of the reporting domains for each individual study are demonstrated in Table I. These are taken from the study by Farrow et al.¹

Introduction reporting: 26/36 (72% of included studies) had clear documentation of the study aims, with the remainder having at least partial completion.

Methods reporting: All studies at least partially identified their data source, with only 5/36 (14%) providing no details regarding quality of the supplied data. A total of 25/36 (92%) studies fully indicated the preprocessing steps undertaken prior to model training and testing, with only one study providing no preprocessing information. Both missing data and external validation were, however, poorly documented in the majority of studies, with this domain absent from 29/36 (81%) for both fields, respectively. Overall, 9/36 (25%) studies did not provide indication of their testing, training, and validation methods. All studies at least partially reported the type of NLP algorithm output. Only one study provided any form of sample size calculation for model development.

Results reporting: Reporting regarding the sample population was fully performed in 13/36 (36%), with model evaluation fully performed in 21/36 (58%). In all, 15/36 (42%) cases did not provide any reference to explainability of the developed model.

Conclusions reporting: All studies made some reference to potential clinical practice interpretation, with the vast majority (35/36; 97%) describing the study limitations. A total of 11/36 (31%) did not provide any reference regarding requirements for potential future research in their manuscript.

Open Science: Only one study provided the code for algorithm development and testing, with two studies providing the data in an open-source forum.

NLP domain: clinical notes/operation notes

Of the identified studies, 18 related to NLP analysis of clinical or operation notes,^{8,14,18-20,24,25,28,29,31,35,38,39,42-46} nine studies related to arthroplasty, five to spinal surgery, two to trauma, one to general orthopaedics, and one to sports surgery. The most common application was to identify adverse outcomes, for example re-admission or surgical complications. Automated database/registry creation was also featured. The median quality assessment for studies in this domain was 19/34 (IQR 17 to 22); 17% were considered good quality and 83% average quality. No study relating to clinical notes or operation notes was identified as poor quality.

NLP domain: radiology reports for feature detection/classification

Several studies (n = 13) related to application of NLP to radiology reports for feature detection and classification.^{9,15-17,21-23,30,32,34,37,40,41} Eight related to trauma, three to spinal surgery, one to foot and ankle, and one to tumour. The most common application was the identification of presence or absence of a fracture (± classification). Median quality assessment was 18/34 (IQR 16 to 21); 8% were considered good quality, 84% average quality, and 8% poor quality.

NLP domains: online reviews/social media and systematic review

Four online reviews/social media reports were included,^13,26,27,33 with one study concerning the use of NLP to perform a systematic review (evaluating arthroplasty).³⁶ One each of the online reviews/social media studies related to general orthopaedics, arthroplasty, shoulder and elbow surgery, and spinal surgery. The main application of NLP to online reviews/social media was automated assessment of the patient experience/feedback using sentiment analysis. Median quality assessment was 12/30 (IQR 10 to 13); 20% were considered good quality, 40% average quality, and 20% poor quality.

Pooled performance metrics

In all, 20/36 studies (56%) reported at least one performance metric outcome. No single study reported results across all the domains assessed. Only five studies reported model calibration. The mean (range) performance metric outcomes for included studies (where reported) are detailed in Table II.

Table II.

Reported performance metrics.

Study details	Accuracy	Sensitivity (recall)	Specificity	Precision (positive predictive value)	Area under the receiver operating curve	F1 score	Calibration
Shah et al⁸	0.94
Mohammadi et al¹⁴		0.79		0.27	0.82
Groot et al¹⁵		0.94	0.82	0.97	0.97	0.96	0.73
Dos Santos et al¹⁶	0.77	0.63	1.00	1.00	0.85
Wang et al¹⁷		0.93	1.00	1.00
Blaker et al¹⁸				0.77
Karhade et al³⁹		0.89	0.99	0.89	0.99	0.89	1.17
Karhade et al³⁹		0.86	0.93	0.51	0.92	0.64	0.61
Wagholikar et al²¹	0.92
Grundmeier et al²²	0.95	0.97		0.92		0.95
Do et al²³	0.79	0.90	0.95	0.90
Sagheb et al²⁴	0.98	1.00		1.00		1.00
Wyles et al²⁵	0.99
Tibbo et al²⁸		1.00	1.00
Fu et al²⁹		0.89	0.99	1.00		0.91
Kolanu et al⁹		0.99	1.00	0.97
Olthof et al³⁰	0.96	0.95	0.98		0.99	0.95
Karhade et al²⁰					0.70		1.54
Foufi et al³²	0.97
Galbusera et al³⁴	0.98	0.95	0.99			0.95
Thirukumaran et al³⁵		0.97		0.97	0.96	0.97
Buchlak et al³⁶					0.68
Jungman et al³⁷		0.81		0.83		0.82
Borjali et al³⁸		1.00		1.00
Karhade et al³⁹		0.94	1.00	0.83
Tan et al⁴⁰		0.94	0.95		0.98
Huhdanpaa et al⁴¹		0.70	0.99	0.90		0.79
Wyles et al⁴²	1.00
Karhade et al³¹		0.83	0.98	0.79	0.95	0.81	3.08
Flores-Balado et al⁴⁴		0.99	0.91	0.19	0.99	0.32
Tavabi et al⁴⁵	1.00	1.00	1.00		1.00
Kita et al⁴⁶	1.00	1.00		0.99
Mean values	0.94	0.91	0.97	0.84	0.91	0.86	1.43

Discussion

The application of NLP to T&O represents a significant opportunity to use the vast quantities of unstructured free-text data generated from routine healthcare interactions, for example in providing summaries of electronic health records or automated analysis of radiology reports. We identified three key domains of current NLP use: clinical/operation notes; radiology reports; and social media/online review posts. Reported performance measure outcomes were almost universally positive (average scores > 80% across all domains); however, there were relatively few high-quality studies identified according to the used reporting criteria. The most problematic areas related to reporting of missing data assessment, external validation, and sample size calculation. Many studies also failed to share the code used as part of the NLP algorithms and report data availability, in accordance with open science principles. Development and widespread use of specific reporting standards related to the application of NLP to healthcare is essential to the appropriate development and reporting of future work in this area.

Our study is, to our knowledge, the first systematic review to focus on applications of NLP in relation to T&O. The results are consistent with reviews of NLP applications in other fields. For example, Davidson et al⁴⁷ examined NLP applications in radiology and identified that the key reporting domains that were poorly represented in studies were external validation, data availability, and code availability. The domains of missing data assessment and sample size assessment were not part of the reporting criteria used in that study, but are areas of critical importance to the correct application of NLP techniques for data analysis. It should be noted that currently, despite high-impact publications governing sample calculations for other aspects of AI inference,⁴⁸ there are currently no peer-reviewed published guidelines regarding calculation of the optimum sample size for NLP development. This is likely to depend significantly on the NLP approach (for example, large language model (LLM) development/fine-tuning vs a rule-based algorithm), and should be a key research priority moving forwards.

Other applications of AI to T&O appear to suffer from similar issues when considering study reporting. Dijkstra et al⁴⁹ evaluated 45 machine learning (ML)-based prediction models and identified that the risk of bias (according to the Prediction model Risk of Bias Assessment Tool, (PROBAST) tool)⁵⁰ was high across the majority of included studies, with documented issues around small sample sizes, inadequate management of missing data, and lack of appropriate study reporting.

It therefore appears that the key methodological issues around study design and reporting are consistent across AI applications within T&O. The importance of model calibration appears to be particularly underappreciated, which is likely impacted by limited understanding of AI terminology and interpretation by orthopaedic surgeons.⁵¹ There is a need for a unified and collaborative approach encompassing all key stakeholders (clinicians, data scientists, statisticians, patients, providers) to maximize future applicability. Use of a development and deployment structure is integral to this process and to realizing the potential of NLP applications in the field of T&O.

Limitations of our study include the wide spectrum of different NLP approaches ranging from simple rule-based methods to LLMs. This makes a focused assessment challenging due to the heterogeneity of how these methods are typically applied and reported. Given the lack of currently available validated reporting criteria related to NLP, we used a published non-specific checklist that may be limited in some methodological domains and categorization accuracy. This study does, however, provide the first structured assessment of current applications of NLP within the T&O literature, which provides an understanding of some of the current limitations and subsequent lack of progress towards real-world implementation. It also highlights typical key deficiencies in reporting that can guide improvements in future NLP research.

In conclusion, NLP techniques have significant potential to revolutionize current approaches to data analysis, allowing use and assessment of vast quantities of unstructured free-text data that were previously a largely untapped resource. There are, however, several issues with study design and reporting that must be addressed to realize the potential for clinical practice integration. Appreciation of the importance of model calibration remains low. Sharing of code and data (where feasible) should become part of routine practice in order to maximize transparency in keeping with open science principles.

Correspondence should be sent to Luke Farrow. E-mail: luke.farrow@doctors.org.uk

References

1. Farrow L , Zhong M , Ashcroft GP , Anderson L , Meek RMD . Interpretation and reporting of predictive or diagnostic machine-learning research in Trauma & Orthopaedics . Bone Joint J . 2021 ; 103-B ( 12 ): 1754 – 1758 . Crossref PubMed Google Scholar

2. Bitran H . From free text to FHIR: text analytics for health launches new feature to boost interoperability . 2022 . https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/from-free-text-to-fhir-text-analytics-for-health-launches-new/ba-p/3257066 ( date last accessed 10 February 2025 ). Google Scholar

3. Farrow L , Wilde K , Dymiter J , et al. Use of “hidden in plain sight” de-identification methodology in electronic healthcare data provides minimal risk of misidentification: results from the icaird safe haven artificial intelligence platform . Int J Popul Data Sci . 2022 ; 25 ( 3 ): 2023 . Google Scholar

4. Ford E , Oswald M , Hassan L , Bozentko K , Nenadic G , Cassell J . Should free-text data in electronic medical records be shared for research? A citizens’ jury study in the UK . J Med Ethics . 2020 ; 46 ( 6 ): 367 – 377 . Crossref PubMed Google Scholar

5. Young M , Holmes NE , Kishore K , et al. Natural language processing diagnosed behavioural disturbance phenotypes in the intensive care unit: characteristics, prevalence, trajectory, treatment, and outcomes . Crit Care . 2023 ; 27 ( 1 ): 425 . Crossref PubMed Google Scholar

6. Li J , Hu S , Shi C , et al. A deep learning and natural language processing-based system for automatic identification and surveillance of high-risk patients undergoing upper endoscopy: a multicenter study . EClinMed . 2022 ; 53 : 101704 . Crossref PubMed Google Scholar

7. Marafino BJ , Park M , Davies JM , et al. Validation of prediction models for critical care outcomes using natural language processing of electronic health record data . JAMA Netw Open . 2018 ; 1 ( 8 ): e185097 . Crossref PubMed Google Scholar

8. Shah RF , Bini S , Vail T . Data for registry and quality review can be retrospectively collected using natural language processing from unstructured charts of arthroplasty patients . Bone Joint J . 2020 ; 102-B ( 7_Supple_B ): 99 – 104 . Crossref PubMed Google Scholar

9. Kolanu N , Brown AS , Beech A , Center JR , White CP . Natural language processing of radiology reports for the identification of patients with fracture . Arch Osteoporos . 2021 ; 16 ( 1 ): 6 . Crossref PubMed Google Scholar

10. Moher D , Liberati A , Tetzlaff J , Altman DG , PRISMA Group . Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement . PLoS Med . 2009 ; 6 ( 7 ): e1000097 . Crossref PubMed Google Scholar

11. No authors listed . OrthoSearch . https://orthosearch.org.uk ( date last accessed 10 February 2025 ). Google Scholar

12. Campbell M , McKenzie JE , Sowden A , et al. Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline . BMJ . 2020 ; 368 : l6890 . Crossref PubMed Google Scholar

13. Langerhuizen DWG , Brown LE , Doornberg JN , Ring D , Kerkhoffs GMMJ , Janssen SJ . Analysis of online reviews of orthopaedic surgeons and orthopaedic practices using natural language processing . J Am Acad Orthop Surg . 2021 ; 29 ( 8 ): 337 – 344 . Crossref PubMed Google Scholar

14. Mohammadi R , Jain S , Namin AT , et al. Predicting unplanned readmissions following a hip or knee arthroplasty: retrospective observational study . JMIR Med Inform . 2020 ; 8 ( 11 ): e19761 . Crossref PubMed Google Scholar

15. Groot OQ , Bongers MER , Karhade AV , et al. Natural language processing for automated quantification of bone metastases reported in free-text bone scintigraphy reports . Acta Oncol . 2020 ; 59 ( 12 ): 1455 – 1460 . Crossref PubMed Google Scholar

16. Pinto Dos Santos D , Brodehl S , Baeßler B , et al. Structured report data can be used to develop deep learning algorithms: a proof of concept in ankle radiographs . Insights Imaging . 2019 ; 10 ( 1 ): 93 . Crossref PubMed Google Scholar

17. Wang Y , Mehrabi S , Sohn S , Atkinson EJ , Amin S , Liu H . Natural language processing of radiology reports for identification of skeletal site-specific fractures . BMC Med Inform Decis Mak . 2019 ; 19 ( Suppl 3 ): 73 . Crossref PubMed Google Scholar

18. Blaker K , Wijewardene A , White E , et al. Electronic search programs are effective in identifying patients with minimal trauma fractures . Osteoporos Int . 2022 ; 33 ( 2 ): 435 – 441 . Crossref PubMed Google Scholar

19. Karhade AV , Bongers MER , Groot OQ , et al. Natural language processing for automated detection of incidental durotomy . Spine J . 2020 ; 20 ( 5 ): 695 – 700 . Crossref PubMed Google Scholar

20. Karhade AV , Bongers MER , Groot OQ , et al. Development of machine learning and natural language processing algorithms for preoperative prediction and automated identification of intraoperative vascular injury in anterior lumbar spine surgery . Spine J . 2021 ; 21 ( 10 ): 1635 – 1642 . Crossref PubMed Google Scholar

21. Wagholikar A , Zuccon G , Nguyen A , et al. Automated classification of limb fractures from free-text radiology reports using a clinician-informed gazetteer methodology . Australas Med J . 2013 ; 6 ( 5 ): 301 – 307 . Crossref PubMed Google Scholar

22. Grundmeier RW , Masino AJ , Casper TC , et al. Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement . Appl Clin Inform . 2016 ; 7 ( 4 ): 1051 – 1068 . Crossref PubMed Google Scholar

23. Do BH , Wu AS , Maley J , Biswal S . Automatic retrieval of bone fracture knowledge using natural language processing . J Digit Imaging . 2013 ; 26 ( 4 ): 709 – 713 . Crossref PubMed Google Scholar

24. Sagheb E , Ramazanian T , Tafti AP , et al. Use of natural language processing algorithms to identify common data elements in operative notes for knee arthroplasty . J Arthroplasty . 2021 ; 36 ( 3 ): 922 – 926 . Crossref PubMed Google Scholar

25. Wyles CC , Tibbo ME , Fu S , et al. Use of natural language processing algorithms to identify common data elements in operative notes for total hip arthroplasty . J Bone Joint Surg Am . 2019 ; 101-A ( 21 ): 1931 – 1938 . Crossref PubMed Google Scholar

26. Bovonratwet P , Shen TS , Islam W , Ast MP , Haas SB , Su EP . Natural language processing of patient-experience comments after primary total knee arthroplasty . J Arthroplasty . 2021 ; 36 ( 3 ): 927 – 934 . Crossref PubMed Google Scholar

27. Menendez ME , Shaker J , Lawler SM , Ring D , Jawa A . Negative patient-experience comments after total shoulder arthroplasty . J Bone Joint Surg Am . 2019 ; 101-A ( 4 ): 330 – 337 . Crossref PubMed Google Scholar

28. Tibbo ME , Wyles CC , Fu S , et al. Use of natural language processing tools to identify and classify periprosthetic femur fractures . J Arthroplasty . 2019 ; 34 ( 10 ): 2216 – 2219 . Crossref PubMed Google Scholar

29. Fu S , Wyles CC , Osmon DR , et al. Automated detection of periprosthetic joint infections and data elements using natural language processing . J Arthroplasty . 2021 ; 36 ( 2 ): 688 – 692 . Crossref PubMed Google Scholar

30. Olthof AW , Shouche P , Fennema EM , et al. Machine learning based natural language processing of radiology reports in orthopaedic trauma . Comput Methods Programs Biomed . 2021 ; 208 : 106304 . Crossref PubMed Google Scholar

31. Karhade AV , Lavoie-Gagne O , Agaronnik N , et al. Natural language processing for prediction of readmission in posterior lumbar fusion patients: which free-text notes have the most utility? Spine J . 2022 ; 22 ( 2 ): 272 – 277 . Crossref PubMed Google Scholar

32. Foufi V , Lanteri S , Gaudet-Blavignac C , Remy P , Montet X , Lovis C . Automatic annotation tool to support supervised machine learning for scaphoid fracture detection . Stud Health Technol Inform . 2018 ; 255 : 210 – 214 . PubMed Google Scholar

33. Dominy CL , Arvind V , Tang JE , et al. Scoliosis surgery in social media: a natural language processing approach to analyzing the online patient perspective . Spine Deform . 2022 ; 10 ( 2 ): 239 – 246 . Crossref PubMed Google Scholar

34. Galbusera F , Cina A , Bassani T , Panico M , Sconfienza LM . Automatic diagnosis of spinal disorders on radiographic images: leveraging existing unstructured datasets with natural language processing . Glob Spine J . 2023 ; 13 ( 5 ): 1257 – 1266 . Crossref PubMed Google Scholar

35. Thirukumaran CP , Zaman A , Rubery PT , et al. Natural language processing for the identification of surgical site infections in orthopaedics . J Bone Joint Surg Am . 2019 ; 101-A ( 24 ): 2167 – 2174 . Crossref PubMed Google Scholar

36. Buchlak QD , Clair J , Esmaili N , Barmare A , Chandrasekaran S . Clinical outcomes associated with robotic and computer-navigated total knee arthroplasty: a machine learning-augmented systematic review . Eur J Orthop Surg Traumatol . 2022 ; 32 ( 5 ): 915 – 931 . Crossref PubMed Google Scholar

37. Jungmann F , Kämpgen B , Hahn F , et al. Natural language processing of radiology reports to investigate the effects of the COVID-19 pandemic on the incidence and age distribution of fractures . Skeletal Radiol . 2022 ; 51 ( 2 ): 375 – 380 . Crossref PubMed Google Scholar

38. Borjali A , Magnéli M , Shin D , Malchau H , Muratoglu OK , Varadarajan KM . Natural language processing with deep learning for medical adverse event detection from free-text medical narratives: a case study of detecting total hip replacement dislocation . Comput Biol Med . 2021 ; 129 : 104140 . Crossref PubMed Google Scholar

39. Karhade AV , Bongers MER , Groot OQ , et al. Can natural language processing provide accurate, automated reporting of wound infection requiring reoperation after lumbar discectomy? Spine J . 2020 ; 20 ( 10 ): 1602 – 1609 . Crossref PubMed Google Scholar

40. Tan WK , Hassanpour S , Heagerty PJ , et al. Comparison of natural language processing rules-based and machine-learning systems to identify lumbar spine imaging findings related to low back pain . Acad Radiol . 2018 ; 25 ( 11 ): 1422 – 1432 . Crossref PubMed Google Scholar

41. Huhdanpaa HT , Tan WK , Rundell SD , et al. Using natural language processing of free-text radiology reports to identify type 1 modic endplate changes . J Digit Imaging . 2018 ; 31 ( 1 ): 84 – 90 . Crossref PubMed Google Scholar

42. Wyles CC , Fu S , Odum SL , et al. External validation of natural language processing algorithms to extract common data elements in tha operative notes . J Arthroplasty . 2023 ; 38 ( 10 ): 2081 – 2084 . Crossref PubMed Google Scholar

43. Karhade AV , Oosterhoff JHF , Groot OQ , et al. Can we geographically validate a natural language processing algorithm for automated detection of incidental durotomy across three independent cohorts from two continents? Clin Orthop Relat Res . 2022 ; 480 ( 9 ): 1766 – 1775 . Crossref PubMed Google Scholar

44. Flores-Balado Á , Castresana Méndez C , Herrero González A , et al. Using artificial intelligence to reduce orthopedic surgical site infection surveillance workload: Algorithm design, validation, and implementation in 4 Spanish hospitals . Am J Infect Control . 2023 ; 51 ( 11 ): 1225 – 1229 . Crossref PubMed Google Scholar

45. Tavabi N , Pruneski J , Golchin S , et al. Building large-scale registries from unstructured clinical notes using a low-resource natural language processing pipeline . Health Informatics . 2022 . Crossref PubMed Google Scholar

46. Kita K , Uemura K , Takao M , et al. Use of artificial intelligence to identify data elements for The Japanese Orthopaedic Association National Registry from operative records . J Orthop Sci . 2023 ; 28 ( 6 ): 1392 – 1399 . Crossref PubMed Google Scholar

47. Davidson EM , Poon MTC , Casey A , et al. The reporting quality of natural language processing studies: systematic review of studies of radiology reports . BMC Med Imaging . 2021 ; 21 : 142 . PubMed Crossref Google Scholar

48. Riley RD , Ensor J , Snell KIE , et al. Calculating the sample size required for developing a clinical prediction model . BMJ . 2020 ; 368 : m441 . Crossref PubMed Google Scholar

49. Dijkstra H , van de Kuit A , de Groot T , et al. Systematic review of machine-learning models in orthopaedic trauma . Bone Jt Open . 2024 ; 5 ( 1 ): 9 – 19 . Crossref PubMed Google Scholar

50. Moons KGM , Wolff RF , Riley RD , et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration . Ann Intern Med . 2019 ; 170 ( 1 ): W1 – W33 . Crossref PubMed Google Scholar

51. Ormond MJ , Clement ND , Harder BG , Farrow L , Glester A . Acceptance and understanding of artificial intelligence in medical research among orthopaedic surgeons . Bone Jt Open . 2023 ; 4 ( 9 ): 696 – 703 . Crossref PubMed Google Scholar

Author contributions

L. Farrow: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing, He is the Guarantor.

A. Raja: Conceptualization, Data curation, Investigation, Project administration, Writing – review & editing

M. Zhong: Conceptualization, Supervision, Writing – review & editing

L. Anderson: Conceptualization, Methodology, Supervision, Writing – review & editing

Funding statement

The author(s) disclose receipt of the following financial or material support for the research, authorship, and/or publication of this article: funding for open access publication is provided by the Chief Scientist Office.

ICMJE COI statement

The author(s) disclose receipt of the following financial or material support for the research, authorship, and/or publication of this article: L. Farrow is currently in receipt of a Chief Scientist Office Scotland Clinical Academic Fellowship, which is focused on the use of artificial intelligence techniques (including natural language processing) to improve the clinical care pathway in those referred for hip and knee arthroplasty. L. Farrow is also the guarantor and confirms that all listed authors meet the authorship criteria.

Data sharing

The data that support the findings for this study are available to other researchers from the corresponding author upon reasonable request.

Ethical review statement

Ethical review was not required due to the nature of the study.

Open access funding

The open access funding for this paper was provided by the Chief Scientist Office (ref. CAF 21/06).

Supplementary material

Table showing the example search strategy, and the PRISMA checklist.

© 2025 Farrow et al. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial No Derivatives (CC BY-NC-ND 4.0) licence, which permits the copying and redistribution of the work only, and provided the original author and source are credited. See https://creativecommons.org/licenses/by-nc-nd/4.0/

Figure 1

Some description here

Information

Journal

Bone & Joint Open

Volume

6 No.3 | Pages 264 - 274

Section

Systematic Review

Published

05 March 2025

DOI

PubMed

40037398

Authors

Expand all

Luke Farrow

SCREDS Clinical Lecturer

Institute of Applied Health Sciences, University of Aberdeen, Aberdeen, UK

Grampian Orthopaedics, Aberdeen Royal Infirmary, Aberdeen, UK

luke.farrow@doctors.org.uk

Search for more articles by this author

Arslan Raja

Medical Student

School of Medicine, University of Edinburgh, Edinburgh, UK

Search for more articles by this author

Mingjun Zhong

Lecturer Computer Science

Institute of Applied Health Sciences, University of Aberdeen, Aberdeen, UK

Search for more articles by this author

Lesley Anderson

Professor of Health Data Science

Institute of Applied Health Sciences, University of Aberdeen, Aberdeen, UK

Search for more articles by this author

Supplementary Material

Table showing the example search strategy, and the PRISMA checklist.

Keywords

Natural language processing

Artificial intelligence

Trauma

Orthopaedics

trauma

arthroplasty

radiology

Excerpta Medica Database

spine

general orthopaedics

tumour surgery

spinal surgery

orthopaedic surgeons

shoulder and elbow surgery

Submit Letter to the Editor

Share article on social media

Facebook

X (Twitter)

Figures

Fig. 1

Study selection process. NLP, natural language processing.

Go to Article

Fig. 2

Summary of overall results.

Go to Article

Metrics

Downloaded 29 times

References

1. Find in article

Farrow L , Zhong M , Ashcroft GP , Anderson L , Meek RMD . Interpretation and reporting of predictive or diagnostic machine-learning research in Trauma & Orthopaedics . Bone Joint J . 2021 ; 103-B ( 12 ): 1754 – 1758 .

Crossref PubMed Google Scholar

2. Find in article

Bitran H . From free text to FHIR: text analytics for health launches new feature to boost interoperability . 2022 . ( date last accessed 10 February 2025 ).

https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/from-free-text-to-fhir-text-analytics-for-health-launches-new/ba-p/3257066 Google Scholar

3. Find in article

Farrow L , Wilde K , Dymiter J , et al. Use of “hidden in plain sight” de-identification methodology in electronic healthcare data provides minimal risk of misidentification: results from the icaird safe haven artificial intelligence platform . Int J Popul Data Sci . 2022 ; 25 ( 3 ): 2023 .

Google Scholar

4. Find in article

Ford E , Oswald M , Hassan L , Bozentko K , Nenadic G , Cassell J . Should free-text data in electronic medical records be shared for research? A citizens’ jury study in the UK . J Med Ethics . 2020 ; 46 ( 6 ): 367 – 377 .

Crossref PubMed Google Scholar

5. Find in article

Young M , Holmes NE , Kishore K , et al. Natural language processing diagnosed behavioural disturbance phenotypes in the intensive care unit: characteristics, prevalence, trajectory, treatment, and outcomes . Crit Care . 2023 ; 27 ( 1 ): 425 .

Crossref PubMed Google Scholar

6. Find in article

Li J , Hu S , Shi C , et al. A deep learning and natural language processing-based system for automatic identification and surveillance of high-risk patients undergoing upper endoscopy: a multicenter study . EClinMed . 2022 ; 53 : 101704 .

Crossref PubMed Google Scholar

7. Find in article

Marafino BJ , Park M , Davies JM , et al. Validation of prediction models for critical care outcomes using natural language processing of electronic health record data . JAMA Netw Open . 2018 ; 1 ( 8 ): e185097 .

Crossref PubMed Google Scholar

8. Find in article

Shah RF , Bini S , Vail T . Data for registry and quality review can be retrospectively collected using natural language processing from unstructured charts of arthroplasty patients . Bone Joint J . 2020 ; 102-B ( 7_Supple_B ): 99 – 104 .

Crossref PubMed Google Scholar

9. Find in article

Kolanu N , Brown AS , Beech A , Center JR , White CP . Natural language processing of radiology reports for the identification of patients with fracture . Arch Osteoporos . 2021 ; 16 ( 1 ): 6 .

Crossref PubMed Google Scholar

10. Find in article

Moher D , Liberati A , Tetzlaff J , Altman DG , PRISMA Group . Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement . PLoS Med . 2009 ; 6 ( 7 ): e1000097 .

Crossref PubMed Google Scholar

11. Find in article

No authors listed . OrthoSearch . ( date last accessed 10 February 2025 ).

https://orthosearch.org.uk Google Scholar

12. Find in article

Campbell M , McKenzie JE , Sowden A , et al. Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline . BMJ . 2020 ; 368 : l6890 .

Crossref PubMed Google Scholar

13. Find in article

Langerhuizen DWG , Brown LE , Doornberg JN , Ring D , Kerkhoffs GMMJ , Janssen SJ . Analysis of online reviews of orthopaedic surgeons and orthopaedic practices using natural language processing . J Am Acad Orthop Surg . 2021 ; 29 ( 8 ): 337 – 344 .

Crossref PubMed Google Scholar

14. Find in article

Mohammadi R , Jain S , Namin AT , et al. Predicting unplanned readmissions following a hip or knee arthroplasty: retrospective observational study . JMIR Med Inform . 2020 ; 8 ( 11 ): e19761 .

Crossref PubMed Google Scholar

15. Find in article

Groot OQ , Bongers MER , Karhade AV , et al. Natural language processing for automated quantification of bone metastases reported in free-text bone scintigraphy reports . Acta Oncol . 2020 ; 59 ( 12 ): 1455 – 1460 .

Crossref PubMed Google Scholar

16. Find in article

Pinto Dos Santos D , Brodehl S , Baeßler B , et al. Structured report data can be used to develop deep learning algorithms: a proof of concept in ankle radiographs . Insights Imaging . 2019 ; 10 ( 1 ): 93 .

Crossref PubMed Google Scholar

17. Find in article

Wang Y , Mehrabi S , Sohn S , Atkinson EJ , Amin S , Liu H . Natural language processing of radiology reports for identification of skeletal site-specific fractures . BMC Med Inform Decis Mak . 2019 ; 19 ( Suppl 3 ): 73 .

Crossref PubMed Google Scholar

18. Find in article

Blaker K , Wijewardene A , White E , et al. Electronic search programs are effective in identifying patients with minimal trauma fractures . Osteoporos Int . 2022 ; 33 ( 2 ): 435 – 441 .

Crossref PubMed Google Scholar

19. Find in article

Karhade AV , Bongers MER , Groot OQ , et al. Natural language processing for automated detection of incidental durotomy . Spine J . 2020 ; 20 ( 5 ): 695 – 700 .

Crossref PubMed Google Scholar

20. Find in article

Karhade AV , Bongers MER , Groot OQ , et al. Development of machine learning and natural language processing algorithms for preoperative prediction and automated identification of intraoperative vascular injury in anterior lumbar spine surgery . Spine J . 2021 ; 21 ( 10 ): 1635 – 1642 .

Crossref PubMed Google Scholar

21. Find in article

Wagholikar A , Zuccon G , Nguyen A , et al. Automated classification of limb fractures from free-text radiology reports using a clinician-informed gazetteer methodology . Australas Med J . 2013 ; 6 ( 5 ): 301 – 307 .

Crossref PubMed Google Scholar

22. Find in article

Grundmeier RW , Masino AJ , Casper TC , et al. Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement . Appl Clin Inform . 2016 ; 7 ( 4 ): 1051 – 1068 .

Crossref PubMed Google Scholar

23. Find in article

Do BH , Wu AS , Maley J , Biswal S . Automatic retrieval of bone fracture knowledge using natural language processing . J Digit Imaging . 2013 ; 26 ( 4 ): 709 – 713 .

Crossref PubMed Google Scholar

24. Find in article

Sagheb E , Ramazanian T , Tafti AP , et al. Use of natural language processing algorithms to identify common data elements in operative notes for knee arthroplasty . J Arthroplasty . 2021 ; 36 ( 3 ): 922 – 926 .

Crossref PubMed Google Scholar

25. Find in article

Wyles CC , Tibbo ME , Fu S , et al. Use of natural language processing algorithms to identify common data elements in operative notes for total hip arthroplasty . J Bone Joint Surg Am . 2019 ; 101-A ( 21 ): 1931 – 1938 .

Crossref PubMed Google Scholar

26. Find in article

Bovonratwet P , Shen TS , Islam W , Ast MP , Haas SB , Su EP . Natural language processing of patient-experience comments after primary total knee arthroplasty . J Arthroplasty . 2021 ; 36 ( 3 ): 927 – 934 .

Crossref PubMed Google Scholar

27. Find in article

Menendez ME , Shaker J , Lawler SM , Ring D , Jawa A . Negative patient-experience comments after total shoulder arthroplasty . J Bone Joint Surg Am . 2019 ; 101-A ( 4 ): 330 – 337 .

Crossref PubMed Google Scholar

28. Find in article

Tibbo ME , Wyles CC , Fu S , et al. Use of natural language processing tools to identify and classify periprosthetic femur fractures . J Arthroplasty . 2019 ; 34 ( 10 ): 2216 – 2219 .

Crossref PubMed Google Scholar

29. Find in article

Fu S , Wyles CC , Osmon DR , et al. Automated detection of periprosthetic joint infections and data elements using natural language processing . J Arthroplasty . 2021 ; 36 ( 2 ): 688 – 692 .

Crossref PubMed Google Scholar

30. Find in article

Olthof AW , Shouche P , Fennema EM , et al. Machine learning based natural language processing of radiology reports in orthopaedic trauma . Comput Methods Programs Biomed . 2021 ; 208 : 106304 .

Crossref PubMed Google Scholar

31. Find in article

Karhade AV , Lavoie-Gagne O , Agaronnik N , et al. Natural language processing for prediction of readmission in posterior lumbar fusion patients: which free-text notes have the most utility? Spine J . 2022 ; 22 ( 2 ): 272 – 277 .

Crossref PubMed Google Scholar

32. Find in article

Foufi V , Lanteri S , Gaudet-Blavignac C , Remy P , Montet X , Lovis C . Automatic annotation tool to support supervised machine learning for scaphoid fracture detection . Stud Health Technol Inform . 2018 ; 255 : 210 – 214 .

PubMed Google Scholar

33. Find in article

Dominy CL , Arvind V , Tang JE , et al. Scoliosis surgery in social media: a natural language processing approach to analyzing the online patient perspective . Spine Deform . 2022 ; 10 ( 2 ): 239 – 246 .

Crossref PubMed Google Scholar

34. Find in article

Galbusera F , Cina A , Bassani T , Panico M , Sconfienza LM . Automatic diagnosis of spinal disorders on radiographic images: leveraging existing unstructured datasets with natural language processing . Glob Spine J . 2023 ; 13 ( 5 ): 1257 – 1266 .

Crossref PubMed Google Scholar

35. Find in article

Thirukumaran CP , Zaman A , Rubery PT , et al. Natural language processing for the identification of surgical site infections in orthopaedics . J Bone Joint Surg Am . 2019 ; 101-A ( 24 ): 2167 – 2174 .

Crossref PubMed Google Scholar

36. Find in article

Buchlak QD , Clair J , Esmaili N , Barmare A , Chandrasekaran S . Clinical outcomes associated with robotic and computer-navigated total knee arthroplasty: a machine learning-augmented systematic review . Eur J Orthop Surg Traumatol . 2022 ; 32 ( 5 ): 915 – 931 .

Crossref PubMed Google Scholar

37. Find in article

Jungmann F , Kämpgen B , Hahn F , et al. Natural language processing of radiology reports to investigate the effects of the COVID-19 pandemic on the incidence and age distribution of fractures . Skeletal Radiol . 2022 ; 51 ( 2 ): 375 – 380 .

Crossref PubMed Google Scholar

38. Find in article

Borjali A , Magnéli M , Shin D , Malchau H , Muratoglu OK , Varadarajan KM . Natural language processing with deep learning for medical adverse event detection from free-text medical narratives: a case study of detecting total hip replacement dislocation . Comput Biol Med . 2021 ; 129 : 104140 .

Crossref PubMed Google Scholar

39. Find in article

Karhade AV , Bongers MER , Groot OQ , et al. Can natural language processing provide accurate, automated reporting of wound infection requiring reoperation after lumbar discectomy? Spine J . 2020 ; 20 ( 10 ): 1602 – 1609 .

Crossref PubMed Google Scholar

40. Find in article

Tan WK , Hassanpour S , Heagerty PJ , et al. Comparison of natural language processing rules-based and machine-learning systems to identify lumbar spine imaging findings related to low back pain . Acad Radiol . 2018 ; 25 ( 11 ): 1422 – 1432 .

Crossref PubMed Google Scholar

41. Find in article

Huhdanpaa HT , Tan WK , Rundell SD , et al. Using natural language processing of free-text radiology reports to identify type 1 modic endplate changes . J Digit Imaging . 2018 ; 31 ( 1 ): 84 – 90 .

Crossref PubMed Google Scholar

42. Find in article

Wyles CC , Fu S , Odum SL , et al. External validation of natural language processing algorithms to extract common data elements in tha operative notes . J Arthroplasty . 2023 ; 38 ( 10 ): 2081 – 2084 .

Crossref PubMed Google Scholar

43. Find in article

Karhade AV , Oosterhoff JHF , Groot OQ , et al. Can we geographically validate a natural language processing algorithm for automated detection of incidental durotomy across three independent cohorts from two continents? Clin Orthop Relat Res . 2022 ; 480 ( 9 ): 1766 – 1775 .

Crossref PubMed Google Scholar

44. Find in article

Flores-Balado Á , Castresana Méndez C , Herrero González A , et al. Using artificial intelligence to reduce orthopedic surgical site infection surveillance workload: Algorithm design, validation, and implementation in 4 Spanish hospitals . Am J Infect Control . 2023 ; 51 ( 11 ): 1225 – 1229 .

Crossref PubMed Google Scholar

45. Find in article

Tavabi N , Pruneski J , Golchin S , et al. Building large-scale registries from unstructured clinical notes using a low-resource natural language processing pipeline . Health Informatics . 2022 .

Crossref PubMed Google Scholar

46. Find in article

Kita K , Uemura K , Takao M , et al. Use of artificial intelligence to identify data elements for The Japanese Orthopaedic Association National Registry from operative records . J Orthop Sci . 2023 ; 28 ( 6 ): 1392 – 1399 .

Crossref PubMed Google Scholar

47. Find in article

Davidson EM , Poon MTC , Casey A , et al. The reporting quality of natural language processing studies: systematic review of studies of radiology reports . BMC Med Imaging . 2021 ; 21 : 142 .

PubMed Crossref Google Scholar

48. Find in article

Riley RD , Ensor J , Snell KIE , et al. Calculating the sample size required for developing a clinical prediction model . BMJ . 2020 ; 368 : m441 .

Crossref PubMed Google Scholar

49. Find in article

Dijkstra H , van de Kuit A , de Groot T , et al. Systematic review of machine-learning models in orthopaedic trauma . Bone Jt Open . 2024 ; 5 ( 1 ): 9 – 19 .

Crossref PubMed Google Scholar

50. Find in article

Moons KGM , Wolff RF , Riley RD , et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration . Ann Intern Med . 2019 ; 170 ( 1 ): W1 – W33 .

Crossref PubMed Google Scholar

51. Find in article

Ormond MJ , Clement ND , Harder BG , Farrow L , Glester A . Acceptance and understanding of artificial intelligence in medical research among orthopaedic surgeons . Bone Jt Open . 2023 ; 4 ( 9 ): 696 – 703 .

Crossref PubMed Google Scholar