Advertisement for orthosearch.org.uk
Results 1 - 10 of 10
Results per page:
The Bone & Joint Journal
Vol. 106-B, Issue 7 | Pages 688 - 695
1 Jul 2024
Farrow L Zhong M Anderson L

Aims. To examine whether natural language processing (NLP) using a clinically based large language model (LLM) could be used to predict patient selection for total hip or total knee arthroplasty (THA/TKA) from routinely available free-text radiology reports. Methods. Data pre-processing and analyses were conducted according to the Artificial intelligence to Revolutionize the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project protocol. This included use of de-identified Scottish regional clinical data of patients referred for consideration of THA/TKA, held in a secure data environment designed for artificial intelligence (AI) inference. Only preoperative radiology reports were included. NLP algorithms were based on the freely available GatorTron model, a LLM trained on over 82 billion words of de-identified clinical text. Two inference tasks were performed: assessment after model-fine tuning (50 Epochs and three cycles of k-fold cross validation), and external validation. Results. For THA, there were 5,558 patient radiology reports included, of which 4,137 were used for model training and testing, and 1,421 for external validation. Following training, model performance demonstrated average (mean across three folds) accuracy, F1 score, and area under the receiver operating curve (AUROC) values of 0.850 (95% confidence interval (CI) 0.833 to 0.867), 0.813 (95% CI 0.785 to 0.841), and 0.847 (95% CI 0.822 to 0.872), respectively. For TKA, 7,457 patient radiology reports were included, with 3,478 used for model training and testing, and 3,152 for external validation. Performance metrics included accuracy, F1 score, and AUROC values of 0.757 (95% CI 0.702 to 0.811), 0.543 (95% CI 0.479 to 0.607), and 0.717 (95% CI 0.657 to 0.778) respectively. There was a notable deterioration in performance on external validation in both cohorts. Conclusion. The use of routinely available preoperative radiology reports provides promising potential to help screen suitable candidates for THA, but not for TKA. The external validation results demonstrate the importance of further model testing and training when confronted with new clinical cohorts. Cite this article: Bone Joint J 2024;106-B(7):688–695


The Bone & Joint Journal
Vol. 102-B, Issue 7 Supple B | Pages 99 - 104
1 Jul 2020
Shah RF Bini S Vail T

Aims. Natural Language Processing (NLP) offers an automated method to extract data from unstructured free text fields for arthroplasty registry participation. Our objective was to investigate how accurately NLP can be used to extract structured clinical data from unstructured clinical notes when compared with manual data extraction. Methods. A group of 1,000 randomly selected clinical and hospital notes from eight different surgeons were collected for patients undergoing primary arthroplasty between 2012 and 2018. In all, 19 preoperative, 17 operative, and two postoperative variables of interest were manually extracted from these notes. A NLP algorithm was created to automatically extract these variables from a training sample of these notes, and the algorithm was tested on a random test sample of notes. Performance of the NLP algorithm was measured in Statistical Analysis System (SAS) by calculating the accuracy of the variables collected, the ability of the algorithm to collect the correct information when it was indeed in the note (sensitivity), and the ability of the algorithm to not collect a certain data element when it was not in the note (specificity). Results. The NLP algorithm performed well at extracting variables from unstructured data in our random test dataset (accuracy = 96.3%, sensitivity = 95.2%, and specificity = 97.4%). It performed better at extracting data that were in a structured, templated format such as range of movement (ROM) (accuracy = 98%) and implant brand (accuracy = 98%) than data that were entered with variation depending on the author of the note such as the presence of deep-vein thrombosis (DVT) (accuracy = 90%). Conclusion. The NLP algorithm used in this study was able to identify a subset of variables from randomly selected unstructured notes in arthroplasty with an accuracy above 90%. For some variables, such as objective exam data, the accuracy was very high. Our findings suggest that automated algorithms using NLP can help orthopaedic practices retrospectively collect information for registries and quality improvement (QI) efforts. Cite this article: Bone Joint J 2020;102-B(7 Supple B):99–104


To examine whether Natural Language Processing (NLP) using a state-of-the-art clinically based Large Language Model (LLM) could predict patient selection for Total Hip Arthroplasty (THA), across a range of routinely available clinical text sources. Data pre-processing and analyses were conducted according to the Ai to Revolutionise the patient Care pathway in Hip and Knee arthroplasty (ARCHERY) project protocol (. https://www.researchprotocols.org/2022/5/e37092/. ). Three types of deidentified Scottish regional clinical free text data were assessed: Referral letters, radiology reports and clinic letters. NLP algorithms were based on the GatorTron model, a Bidirectional Encoder Representations from Transformers (BERT) based LLM trained on 82 billion words of de-identified clinical text. Three specific inference tasks were performed: assessment of the base GatorTron model, assessment after model-fine tuning, and external validation. There were 3911, 1621 and 1503 patient text documents included from the sources of referral letters, radiology reports and clinic letters respectively. All letter sources displayed significant class imbalance, with only 15.8%, 24.9%, and 5.9% of patients linked to the respective text source documentation having undergone surgery. Untrained model performance was poor, with F1 scores (harmonic mean of precision and recall) of 0.02, 0.38 and 0.09 respectively. This did however improve with model training, with mean scores (range) of 0.39 (0.31–0.47), 0.57 (0.48–0.63) and 0.32 (0.28–0.39) across the 5 folds of cross-validation. Performance deteriorated on external validation across all three groups but remained highest for the radiology report cohort. Even with further training on a large cohort of routinely collected free-text data a clinical LLM fails to adequately perform clinical inference in NLP tasks regarding identification of those selected to undergo THA. This likely relates to the complexity and heterogeneity of free-text information and the way that patients are determined to be surgical candidates


Orthopaedic Proceedings
Vol. 101-B, Issue SUPP_12 | Pages 25 - 25
1 Oct 2019
Vail TP Shah R Bini S
Full Access

Background. 80% of health data is recorded as free text and not easily accessible for use in research and QI. Natural Language Processing (NLP) could be used as a method to abstract data easier than manual methods. Our objectives were to investigate whether NLP can be used to abstract structured clinical data from notes for total joint arthroplasty (TJA). Methods. Clinical and hospital notes were collected for every patient undergoing a primary TJA. Human annotators reviewed a random training sample(n=400) and test sample(n=600) of notes from 6 different surgeons and manually abstracted historical, physical exam, operative, and outcomes data to create a gold standard dataset. Historical data collected included pain information and the various treatments tried (medications, injections, physical therapy). Physical exam information collected included ROM and the presence of deformity. Operative information included the angle of tibial slope, angle of tibial and femoral cuts, and patellar tracking for TKAs and approach and repair of external rotators for THAs. In addition, information on implant brand/type/size, sutures, and drains were collected for all TJAs. Finally, the occurrence of complications was collected. We then trained and tested our NLP system to automatically collect the respective variables. Finally, we assessed our automated approach by comparing system-generated findings against the gold standard. Results. Overall, the NLP algorithm performed well at abstracting all variables in our random test dataset (accuracy=96.3%, sensitivity=95.2%, specificity=97.4%). It performed better at abstracting historical information (accuracy=97.0%), physical exam information (accuracy=98.8%), and information on complications (accuracy=96.8%) compared to operative information (accuracy=94.8%), but it performed well with a sensitivity and specificity >90.0% for all variables. Discussion. The NLP system achieved good performance on a subset of randomly selected notes with querying information about TJA patients. Automated algorithms like the one developed here can help orthopedic practices collect information for registries and help guide QI without increased time-burden. For any tables or figures, please contact the authors directly


Orthopaedic Proceedings
Vol. 100-B, Issue SUPP_13 | Pages 55 - 55
1 Oct 2018
Tibbo ME Wyles CC Maradit-Kremers H Fu S Wang Y Sohn S Berry DJ Lewallen DG
Full Access

Introduction. Manual chart review is labor-intensive and requires specialized knowledge possessed by highly-trained medical professionals. The cost and infrastructure challenges required to implement this is prohibitive for most hospitals. Natural language processing (NLP) tools are distinctive in their ability to extract critical information from raw text in the electronic health records (EHR). As a simple proof-of-concept, for the potential application of this technology, we examined its ability to discriminate between a binary classification (periprosthetic fracture [PPFFx] vs. no PPFFx) followed by a more complex classification of the same problem (Vancouver). Methods. PPFFx were identified among all THAs performed at a single academic institution between 1977 and 2015. A training cohort (n = 90 PPFFx) selected randomly by an electronic program was utilized to develop a prototype NLP algorithm and an additional randomly-selected 86 PPFFx were used to further validate the algorithm. Keywords to identify, and subsequently classify, Vancouver type PPFFx about THA were defined. The algorithm was applied to consult and operative notes to evaluate language used by surgeons as a means to predict the correct pathology in the absence of a listed, precise diagnosis (e.g. Vancouver B2). Validation statistics were calculated using manual chart review as the gold standard. Results. In distinguishing between 2983 cases of PPFFx, 2898 cases of no PPFFx, and 85 cases of index THA performed for fracture, the NLP algorithm demonstrated an accuracy of 99.8%. Among 73 PPFFx test cases, the algorithm demonstrated a sensitivity of 87.1%, specificity of 78.6%, PPV of 75.0%, and NPV of 89.1% in determining the correct Vancouver classification. Overall Vancouver classification accuracy was moderate at 82.2%. Conclusion. NLP-enabled algorithms are a promising alternative to the current gold standard of manual chart review for evaluating outcomes of large data sets in orthopedics. Despite their immaturity with respect to orthopedic applications, NLP algorithms applied to surgeon notes demonstrated excellent accuracy (99.8%) in delineating a simple binary outcome, in this case the presence or absence of PPFFx. However, accuracy of the algorithm was attenuated when trying to predict a Vancouver classification subtype given the wide variability in surgeon dictation styles and precision of language. Nevertheless, this study provides a proof-of-concept for use of this technology in clinical research and registry development endeavors as it can reliably extract certain select data of interest in an expeditious and cost-effective manner. Summary. NLP-enabled algorithms are a promising alternative to the current gold standard of manual chart review for the extraction and evaluation of large data sets in orthopedics


Bone & Joint Research
Vol. 12, Issue 7 | Pages 447 - 454
10 Jul 2023
Lisacek-Kiosoglous AB Powling AS Fontalis A Gabr A Mazomenos E Haddad FS

The use of artificial intelligence (AI) is rapidly growing across many domains, of which the medical field is no exception. AI is an umbrella term defining the practical application of algorithms to generate useful output, without the need of human cognition. Owing to the expanding volume of patient information collected, known as ‘big data’, AI is showing promise as a useful tool in healthcare research and across all aspects of patient care pathways. Practical applications in orthopaedic surgery include: diagnostics, such as fracture recognition and tumour detection; predictive models of clinical and patient-reported outcome measures, such as calculating mortality rates and length of hospital stay; and real-time rehabilitation monitoring and surgical training. However, clinicians should remain cognizant of AI’s limitations, as the development of robust reporting and validation frameworks is of paramount importance to prevent avoidable errors and biases. The aim of this review article is to provide a comprehensive understanding of AI and its subfields, as well as to delineate its existing clinical applications in trauma and orthopaedic surgery. Furthermore, this narrative review expands upon the limitations of AI and future direction.

Cite this article: Bone Joint Res 2023;12(7):447–454.


The Bone & Joint Journal
Vol. 104-B, Issue 12 | Pages 1292 - 1303
1 Dec 2022
Polisetty TS Jain S Pang M Karnuta JM Vigdorchik JM Nawabi DH Wyles CC Ramkumar PN

Literature surrounding artificial intelligence (AI)-related applications for hip and knee arthroplasty has proliferated. However, meaningful advances that fundamentally transform the practice and delivery of joint arthroplasty are yet to be realized, despite the broad range of applications as we continue to search for meaningful and appropriate use of AI. AI literature in hip and knee arthroplasty between 2018 and 2021 regarding image-based analyses, value-based care, remote patient monitoring, and augmented reality was reviewed. Concerns surrounding meaningful use and appropriate methodological approaches of AI in joint arthroplasty research are summarized. Of the 233 AI-related orthopaedics articles published, 178 (76%) constituted original research, while the rest consisted of editorials or reviews. A total of 52% of original AI-related research concerns hip and knee arthroplasty (n = 92), and a narrative review is described. Three studies were externally validated. Pitfalls surrounding present-day research include conflating vernacular (“AI/machine learning”), repackaging limited registry data, prematurely releasing internally validated prediction models, appraising model architecture instead of inputted data, withholding code, and evaluating studies using antiquated regression-based guidelines. While AI has been applied to a variety of hip and knee arthroplasty applications with limited clinical impact, the future remains promising if the question is meaningful, the methodology is rigorous and transparent, the data are rich, and the model is externally validated. Simple checkpoints for meaningful AI adoption include ensuring applications focus on: administrative support over clinical evaluation and management; necessity of the advanced model; and the novelty of the question being answered.

Cite this article: Bone Joint J 2022;104-B(12):1292–1303.


Bone & Joint Open
Vol. 5, Issue 1 | Pages 9 - 19
16 Jan 2024
Dijkstra H van de Kuit A de Groot TM Canta O Groot OQ Oosterhoff JH Doornberg JN

Aims

Machine-learning (ML) prediction models in orthopaedic trauma hold great promise in assisting clinicians in various tasks, such as personalized risk stratification. However, an overview of current applications and critical appraisal to peer-reviewed guidelines is lacking. The objectives of this study are to 1) provide an overview of current ML prediction models in orthopaedic trauma; 2) evaluate the completeness of reporting following the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement; and 3) assess the risk of bias following the Prediction model Risk Of Bias Assessment Tool (PROBAST) tool.

Methods

A systematic search screening 3,252 studies identified 45 ML-based prediction models in orthopaedic trauma up to January 2023. The TRIPOD statement assessed transparent reporting and the PROBAST tool the risk of bias.


Bone & Joint Research
Vol. 12, Issue 8 | Pages 494 - 496
9 Aug 2023
Clement ND Simpson AHRW

Cite this article: Bone Joint Res 2023;12(8):494–496.


Bone & Joint Research
Vol. 9, Issue 10 | Pages 635 - 644
1 Oct 2020
Lemaignen A Grammatico-Guillon L Astagneau P Marmor S Ferry T Jolivet-Gougeon A Senneville E Bernard L

Aims

The French registry for complex bone and joint infections (C-BJIs) was created in 2012 in order to facilitate a homogeneous management of patients presented for multidisciplinary advice in referral centres for C-BJI, to monitor their activity and to produce epidemiological data. We aimed here to present the genesis and characteristics of this national registry and provide the analysis of its data quality.

Methods

A centralized online secured database gathering the electronic case report forms (eCRFs) was filled for every patient presented in multidisciplinary meetings (MM) among the 24 French referral centres. Metrics of this registry were described between 2012 and 2016. Data quality was assessed by comparing essential items from the registry with a controlled dataset extracted from medical charts of a random sample of patients from each centre. Internal completeness and consistency were calculated.