Advertisement for orthosearch.org.uk
Results 1 - 4 of 4
Results per page:
The Bone & Joint Journal
Vol. 106-B, Issue 7 | Pages 688 - 695
1 Jul 2024
Farrow L Zhong M Anderson L

Aims. To examine whether natural language processing (NLP) using a clinically based large language model (LLM) could be used to predict patient selection for total hip or total knee arthroplasty (THA/TKA) from routinely available free-text radiology reports. Methods. Data pre-processing and analyses were conducted according to the Artificial intelligence to Revolutionize the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project protocol. This included use of de-identified Scottish regional clinical data of patients referred for consideration of THA/TKA, held in a secure data environment designed for artificial intelligence (AI) inference. Only preoperative radiology reports were included. NLP algorithms were based on the freely available GatorTron model, a LLM trained on over 82 billion words of de-identified clinical text. Two inference tasks were performed: assessment after model-fine tuning (50 Epochs and three cycles of k-fold cross validation), and external validation. Results. For THA, there were 5,558 patient radiology reports included, of which 4,137 were used for model training and testing, and 1,421 for external validation. Following training, model performance demonstrated average (mean across three folds) accuracy, F1 score, and area under the receiver operating curve (AUROC) values of 0.850 (95% confidence interval (CI) 0.833 to 0.867), 0.813 (95% CI 0.785 to 0.841), and 0.847 (95% CI 0.822 to 0.872), respectively. For TKA, 7,457 patient radiology reports were included, with 3,478 used for model training and testing, and 3,152 for external validation. Performance metrics included accuracy, F1 score, and AUROC values of 0.757 (95% CI 0.702 to 0.811), 0.543 (95% CI 0.479 to 0.607), and 0.717 (95% CI 0.657 to 0.778) respectively. There was a notable deterioration in performance on external validation in both cohorts. Conclusion. The use of routinely available preoperative radiology reports provides promising potential to help screen suitable candidates for THA, but not for TKA. The external validation results demonstrate the importance of further model testing and training when confronted with new clinical cohorts. Cite this article: Bone Joint J 2024;106-B(7):688–695


Bone & Joint 360
Vol. 13, Issue 3 | Pages 5 - 6
3 Jun 2024
Ollivere B


To examine whether Natural Language Processing (NLP) using a state-of-the-art clinically based Large Language Model (LLM) could predict patient selection for Total Hip Arthroplasty (THA), across a range of routinely available clinical text sources. Data pre-processing and analyses were conducted according to the Ai to Revolutionise the patient Care pathway in Hip and Knee arthroplasty (ARCHERY) project protocol (. https://www.researchprotocols.org/2022/5/e37092/. ). Three types of deidentified Scottish regional clinical free text data were assessed: Referral letters, radiology reports and clinic letters. NLP algorithms were based on the GatorTron model, a Bidirectional Encoder Representations from Transformers (BERT) based LLM trained on 82 billion words of de-identified clinical text. Three specific inference tasks were performed: assessment of the base GatorTron model, assessment after model-fine tuning, and external validation. There were 3911, 1621 and 1503 patient text documents included from the sources of referral letters, radiology reports and clinic letters respectively. All letter sources displayed significant class imbalance, with only 15.8%, 24.9%, and 5.9% of patients linked to the respective text source documentation having undergone surgery. Untrained model performance was poor, with F1 scores (harmonic mean of precision and recall) of 0.02, 0.38 and 0.09 respectively. This did however improve with model training, with mean scores (range) of 0.39 (0.31–0.47), 0.57 (0.48–0.63) and 0.32 (0.28–0.39) across the 5 folds of cross-validation. Performance deteriorated on external validation across all three groups but remained highest for the radiology report cohort. Even with further training on a large cohort of routinely collected free-text data a clinical LLM fails to adequately perform clinical inference in NLP tasks regarding identification of those selected to undergo THA. This likely relates to the complexity and heterogeneity of free-text information and the way that patients are determined to be surgical candidates


Orthopaedic Proceedings
Vol. 85-B, Issue SUPP_II | Pages 177 - 177
1 Feb 2003
Hinsley D Softley I Garrick S
Full Access

Anti-personnel (AP) mines pose a serious threat to mine clearance personnel and developing effective foot/ leg protection is of benefit. In order to evaluate the effectiveness of a protective system it is necessary to have a physical model of the human leg and foot that replicates bony injury from AP mines. The purpose of this study was to develop and assess a lower limb model (LLM) that reflects human bony injury from AP mines. The LLM comprised a red deer tibia, calcaneum, talus, tarsus and metatarsal encased in 20% gelatine. A British Army combat boot was fitted onto the LLM. Two types of simulated AP mine were used comprising 29g and 50g of plastic explosive (PE). Mines were surface laid and the heel of the boot was placed directly over the top of the mine. Firings with both mine types were performed with the heel in contact with the mine. Further firings with the 50g PE mine included a variable stand-off (e.g. distance of the sole of the boot from the mine) of 25–100mm. The LLM was assessed for bony injury using the International Committee for the Red Cross (ICRC) mine injury system and a mine fracture score (MFS). The pattern of injury resulting from the two mine types, with no stand-off, was different. The 50g mine produced traumatic amputations in four out of five firings, fractures occuring at 3–11 cm from the ankle joint line (pattern 1 injury – ICRC classification). The 29g mine produced hindfoot injuries with comminuted fractures of the calcaneum and talus in all five firings. These are similar to the bony injuries seen in AP mine casualties in Croatia. Use of the MFS allowed comparison with previous cadaver experiments and demonstrated a graded response to increasing stand-off. The LLM replicated the pattern of some bony injuries seen in landmine casualties and could be used to assess the effectiveness of mine protective foot/leg wear