Advertisement for orthosearch.org.uk
Results 1 - 20 of 122
Results per page:
Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_16 | Pages 71 - 71
19 Aug 2024
Nonnenmacher L Fischer M Kaderali L Wassilew GI
Full Access

Periacetabular Osteotomy (PAO) has become the most important surgical procedure for patients with hip dysplasia, offering significant pain relief and improved joint function. This study focuses on recovery after PAO, specifically the return to sports (RTS) timeline, with the objective of identifying preoperative predictors to optimize patient outcomes. Our prospective, monocentric study from 2019 to 2023 included 698 hips from 606 patients undergoing PAO. Comprehensive preoperative data were collected, including demographic information, clinical assessments (Modified Harris Hip Score (mHHS), International Hip Outcome Tool-12 (iHot-12), Hip Disability and Osteoarthritis Outcome Score (HOOS), UCLA Activity Score) and psychological evaluations (Brief Symptom Inventory (BSI) and SF-36 Health Survey). Advanced logistic regression and machine learning techniques (R Core Team. (2016)) were employed to develop a predictive model. Multivariate regression analysis revealed that several preoperative factors significantly influenced the RTS timeline. These included gender, invasiveness of the surgical approach, preoperative UCLA Score, preoperative sports activity level, mHHS, and various HOOS subscales (Sport/Recreation, Symptoms, Pain) as well as psychological factors (BSI and SF-36). The subsequent model, using a decision tree approach, showed that the combination of a UCLA score greater than 3 (p<0.001), non-female gender (p=0.003), preoperative sports frequency not less than twice per week (p<0.001), participation in high-impact sports preoperatively (p=0.008), and a BSI anxiety score less than 2 (p<0.001) had the highest likelihood of early RTS with a probability of 71.4% at three months. Using a decision tree approach, this model provides a nuanced prediction of RTS after PAO, highlighting the synergy of physical, psychological, and lifestyle influences. By quantifying the impact of these variables, it provides clinicians with a valuable tool for predicting individual patient recovery trajectories, aiding in tailored rehabilitation planning and predicting postoperative satisfaction


Background. Magnetic resonance imaging (MRI) algorithm identifies end stage severely degenerated disc as ‘black’, and a moderately degenerate to non-degenerated disc as ‘white’. MRI is based on signal intensity changes that identifies loss of proteoglycans, water, and general radial bulging but lacks association with microscopic features such as fissure, endplate damage, persistent inflammatory catabolism that facilitates proteoglycan loss leading to ultimate collapse of annulus with neo-innervation and vascularization, as an indicator of pain. Thus, we propose a novel machine learning based imaging tool that combines quantifiable microscopic histopathological features with macroscopic signal intensities changes for hybrid assessment of disc degeneration. Methods. 100-disc tissue were collected from patients undergoing surgeries and cadaveric controls, age range of 35–75 years. MRI Pfirrmann grades were collected in each case, and each disc specimen were processed to identify the 1) region of interest 2) analytical imaging vector 3) data assimilation, grading and scoring pattern 4) identification of machine learning algorithm 5) predictive learning parameters to form an interface between hardware and software operating system. Results. Kernel algorithm defines non-linear data in xy histogram. X,Y values are scored histological spatial variables that signifies loss of proteoglycans, blood vessels ingrowth, and occurrence of tears or fissures in the inner and outer annulus regions mapped with the dampening and graded series of signal intensity changes. Conclusion. To our knowledge this study is the first to propose a machine learning method between microscopic spatial tissue changes and macroscopic signal intensity grades in the intervertebral disc. No conflict of interest declared.  . Sources of Funding. ICMR/5/4-5/3/42/Neuro/2022-NCD-1, Dr TMA PAI SMU/ 131/ REG/ TMA PURK/ 164/2020. A part of the above study was presented as an oral paper at the International Society for the Study of Lumbar Spine (ISSLS) meeting held on 1–5. th. May 2023, Melbourne, Australia


The Bone & Joint Journal
Vol. 106-B, Issue 8 | Pages 760 - 763
1 Aug 2024
Mancino F Fontalis A Haddad FS


The Bone & Joint Journal
Vol. 106-B, Issue 7 | Pages 656 - 661
1 Jul 2024
Bolbocean C Hattab Z O'Neill S Costa ML

Aims

Cemented hemiarthroplasty is an effective form of treatment for most patients with an intracapsular fracture of the hip. However, it remains unclear whether there are subgroups of patients who may benefit from the alternative operation of a modern uncemented hemiarthroplasty – the aim of this study was to investigate this issue. Knowledge about the heterogeneity of treatment effects is important for surgeons in order to target operations towards specific subgroups who would benefit the most.

Methods

We used causal forest analysis to compare subgroup- and individual-level treatment effects between cemented and modern uncemented hemiarthroplasty in patients aged > 60 years with an intracapsular fracture of the hip, using data from the World Hip Trauma Evaluation 5 (WHiTE 5) multicentre randomized clinical trial. EuroQol five-dimension index scores were used to measure health-related quality of life at one, four, and 12 months postoperatively.


The Bone & Joint Journal
Vol. 106-B, Issue 7 | Pages 688 - 695
1 Jul 2024
Farrow L Zhong M Anderson L

Aims

To examine whether natural language processing (NLP) using a clinically based large language model (LLM) could be used to predict patient selection for total hip or total knee arthroplasty (THA/TKA) from routinely available free-text radiology reports.

Methods

Data pre-processing and analyses were conducted according to the Artificial intelligence to Revolutionize the patient Care pathway in Hip and knEe aRthroplastY (ARCHERY) project protocol. This included use of de-identified Scottish regional clinical data of patients referred for consideration of THA/TKA, held in a secure data environment designed for artificial intelligence (AI) inference. Only preoperative radiology reports were included. NLP algorithms were based on the freely available GatorTron model, a LLM trained on over 82 billion words of de-identified clinical text. Two inference tasks were performed: assessment after model-fine tuning (50 Epochs and three cycles of k-fold cross validation), and external validation.


Bone & Joint 360
Vol. 13, Issue 3 | Pages 28 - 31
3 Jun 2024

The June 2024 Wrist & Hand Roundup360 looks at: One-year outcomes of the anatomical front and back reconstruction for scapholunate dissociation; Limited intercarpal fusion versus proximal row carpectomy in the treatment of SLAC or SNAC wrist: results after 3.5 years; Prognostic factors for clinical outcomes after arthroscopic treatment of traumatic central tears of the triangular fibrocartilage complex; The rate of nonunion in the MRI-detected occult scaphoid fracture: a multicentre cohort study; Does correction of carpal malalignment influence the union rate of scaphoid nonunion surgery?; Provision of a home-based video-assisted therapy programme in thumb carpometacarpal arthroplasty; Is replantation associated with better hand function after traumatic hand amputation than after revision amputation?; Diagnostic performance of artificial intelligence for detection of scaphoid and distal radius fractures: a systematic review.


Bone & Joint 360
Vol. 13, Issue 3 | Pages 18 - 20
3 Jun 2024

The June 2024 Hip & Pelvis Roundup. 360. looks at: Machine learning did not outperform conventional competing risk modelling to predict revision arthroplasty; Unravelling the risks: incidence and reoperation rates for femoral fractures post-total hip arthroplasty; Spinal versus general anaesthesia for hip arthroscopy: a COVID-19 pandemic- and opioid epidemic-driven study; Development and validation of a deep-learning model to predict total hip arthroplasty on radiographs; Ambulatory centres lead in same-day hip and knee arthroplasty success; Exploring the impact of smokeless tobacco on total hip arthroplasty outcomes: a deeper dive into postoperative complications


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_6 | Pages 59 - 59
2 May 2024
Adla SR Ameer A Silva MD Unnithan A
Full Access

Arthroplasties are widely performed to improve mobility and quality of life for symptomatic knee/hip osteoarthritis patients. With increasing rates of Total Joint Replacements in the United Kingdom, predicting length of stay is vital for hospitals to control costs, manage resources, and prevent postoperative complications. A longer Length of stay has been shown to negatively affect the quality of care, outcomes and patient satisfaction. Thus, predicting LOS enables us to make full use of medical resources. Clinical characteristics were retrospectively collected from 1,303 patients who received TKA and THR. A total of 21 variables were included, to develop predictive models for LOS by multiple machine learning (ML) algorithms, including Random Forest Classifier (RFC), K-Nearest Neighbour (KNN), Extreme Gradient Boost (XgBoost), and Na¯ve Bayes (NB). These models were evaluated by the receiver operating characteristic (ROC) curve for predictive performance. A feature selection approach was used to identify optimal predictive factors. Based on the ROC of Training result, XgBoost algorithm was selected to be applied to the Test set. The areas under the ROC curve (AUCs) of the 4 models ranged from 0.730 to 0.966, where higher AUC values generally indicate better predictive performance. All the ML-based models performed better than conventional statistical methods in ROC curves. The XgBoost algorithm with 21 variables was identified as the best predictive model. The feature selection indicated the top six predictors: Age, Operation Duration, Primary Procedure, BMI, creatinine and Month of Surgery. By analysing clinical characteristics, it is feasible to develop ML-based models for the preoperative prediction of LOS for patients who received TKA and THR, and the XgBoost algorithm performed the best, in terms of accuracy of predictive performance. As this model was originally crafted at Ashford and St. Peters Hospital, we have naturally named it as THE ASHFORD OUTCOME


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_6 | Pages 49 - 49
2 May 2024
Green J Khanduja V Malviya A
Full Access

Femoroacetabular Impingement (FAI) syndrome, characterised by abnormal hip contact causing symptoms and osteoarthritis, is measured using the International Hip Outcome Tool (iHOT). This study uses machine learning to predict patient outcomes post-treatment for FAI, focusing on achieving a minimally clinically important difference (MCID) at 52 weeks. A retrospective analysis of 6133 patients from the NAHR who underwent hip arthroscopic treatment for FAI between November 2013 and March 2022 was conducted. MCID was defined as half a standard deviation (13.61) from the mean change in iHOT score at 12 months. SKLearn Maximum Absolute Scaler and Logistic Regression were applied to predict achieving MCID, using baseline and 6-month follow-up data. The model's performance was evaluated by accuracy, area under the curve, and recall, using pre-operative and up to 6-month postoperative variables. A total of 23.1% (1422) of patients completed both baseline and 1-year follow-up iHOT surveys. The best results were obtained using both pre and postoperative variables. The machine learning model achieved 88.1% balanced accuracy, 89.6% recall, and 92.3% AUC. Sensitivity was 83.7% and specificity 93.5%. Key variables determining outcomes included MCID achievement at 6 months, baseline iHOT score, 6-month iHOT scores for pain, and difficulty in walking or using stairs. The study confirmed the utility of machine learning in predicting long-term outcomes following arthroscopic treatment for FAI. MCID, based on the iHOT 12 tools, indicates meaningful clinical changes. Machine learning demonstrated high accuracy and recall in distinguishing between patients achieving MCID and those who did not. This approach could help early identification of patients at risk of not meeting the MCID threshold one year after treatment


Bone & Joint Research
Vol. 13, Issue 4 | Pages 184 - 192
18 Apr 2024
Morita A Iida Y Inaba Y Tezuka T Kobayashi N Choe H Ike H Kawakami E

Aims

This study was designed to develop a model for predicting bone mineral density (BMD) loss of the femur after total hip arthroplasty (THA) using artificial intelligence (AI), and to identify factors that influence the prediction. Additionally, we virtually examined the efficacy of administration of bisphosphonate for cases with severe BMD loss based on the predictive model.

Methods

The study included 538 joints that underwent primary THA. The patients were divided into groups using unsupervised time series clustering for five-year BMD loss of Gruen zone 7 postoperatively, and a machine-learning model to predict the BMD loss was developed. Additionally, the predictor for BMD loss was extracted using SHapley Additive exPlanations (SHAP). The patient-specific efficacy of bisphosphonate, which is the most important categorical predictor for BMD loss, was examined by calculating the change in predictive probability when hypothetically switching between the inclusion and exclusion of bisphosphonate.


Bone & Joint Open
Vol. 5, Issue 3 | Pages 243 - 251
25 Mar 2024
Wan HS Wong DLL To CS Meng N Zhang T Cheung JPY

Aims

This systematic review aims to identify 3D predictors derived from biplanar reconstruction, and to describe current methods for improving curve prediction in patients with mild adolescent idiopathic scoliosis.

Methods

A comprehensive search was conducted by three independent investigators on MEDLINE, PubMed, Web of Science, and Cochrane Library. Search terms included “adolescent idiopathic scoliosis”,“3D”, and “progression”. The inclusion and exclusion criteria were carefully defined to include clinical studies. Risk of bias was assessed with the Quality in Prognostic Studies tool (QUIPS) and Appraisal tool for Cross-Sectional Studies (AXIS), and level of evidence for each predictor was rated with the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) approach. In all, 915 publications were identified, with 377 articles subjected to full-text screening; overall, 31 articles were included.


Bone & Joint Open
Vol. 5, Issue 2 | Pages 139 - 146
15 Feb 2024
Wright BM Bodnar MS Moore AD Maseda MC Kucharik MP Diaz CC Schmidt CM Mir HR

Aims

While internet search engines have been the primary information source for patients’ questions, artificial intelligence large language models like ChatGPT are trending towards becoming the new primary source. The purpose of this study was to determine if ChatGPT can answer patient questions about total hip (THA) and knee arthroplasty (TKA) with consistent accuracy, comprehensiveness, and easy readability.

Methods

We posed the 20 most Google-searched questions about THA and TKA, plus ten additional postoperative questions, to ChatGPT. Each question was asked twice to evaluate for consistency in quality. Following each response, we responded with, “Please explain so it is easier to understand,” to evaluate ChatGPT’s ability to reduce response reading grade level, measured as Flesch-Kincaid Grade Level (FKGL). Five resident physicians rated the 120 responses on 1 to 5 accuracy and comprehensiveness scales. Additionally, they answered a “yes” or “no” question regarding acceptability. Mean scores were calculated for each question, and responses were deemed acceptable if ≥ four raters answered “yes.”


Bone & Joint Research
Vol. 13, Issue 2 | Pages 66 - 82
5 Feb 2024
Zhao D Zeng L Liang G Luo M Pan J Dou Y Lin F Huang H Yang W Liu J

Aims

This study aimed to explore the biological and clinical importance of dysregulated key genes in osteoarthritis (OA) patients at the cartilage level to find potential biomarkers and targets for diagnosing and treating OA.

Methods

Six sets of gene expression profiles were obtained from the Gene Expression Omnibus database. Differential expression analysis, weighted gene coexpression network analysis (WGCNA), and multiple machine-learning algorithms were used to screen crucial genes in osteoarthritic cartilage, and genome enrichment and functional annotation analyses were used to decipher the related categories of gene function. Single-sample gene set enrichment analysis was performed to analyze immune cell infiltration. Correlation analysis was used to explore the relationship among the hub genes and immune cells, as well as markers related to articular cartilage degradation and bone mineralization.


The Bone & Joint Journal
Vol. 106-B, Issue 2 | Pages 203 - 211
1 Feb 2024
Park JH Won J Kim H Kim Y Kim S Han I

Aims

This study aimed to compare the performance of survival prediction models for bone metastases of the extremities (BM-E) with pathological fractures in an Asian cohort, and investigate patient characteristics associated with survival.

Methods

This retrospective cohort study included 469 patients, who underwent surgery for BM-E between January 2009 and March 2022 at a tertiary hospital in South Korea. Postoperative survival was calculated using the PATHFx3.0, SPRING13, OPTIModel, SORG, and IOR models. Model performance was assessed with area under the curve (AUC), calibration curve, Brier score, and decision curve analysis. Cox regression analyses were performed to evaluate the factors contributing to survival.


Bone & Joint Open
Vol. 5, Issue 1 | Pages 9 - 19
16 Jan 2024
Dijkstra H van de Kuit A de Groot TM Canta O Groot OQ Oosterhoff JH Doornberg JN

Aims

Machine-learning (ML) prediction models in orthopaedic trauma hold great promise in assisting clinicians in various tasks, such as personalized risk stratification. However, an overview of current applications and critical appraisal to peer-reviewed guidelines is lacking. The objectives of this study are to 1) provide an overview of current ML prediction models in orthopaedic trauma; 2) evaluate the completeness of reporting following the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement; and 3) assess the risk of bias following the Prediction model Risk Of Bias Assessment Tool (PROBAST) tool.

Methods

A systematic search screening 3,252 studies identified 45 ML-based prediction models in orthopaedic trauma up to January 2023. The TRIPOD statement assessed transparent reporting and the PROBAST tool the risk of bias.


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_1 | Pages 78 - 78
2 Jan 2024
Ponniah H Edwards T Lex J Davidson R Al-Zubaidy M Afzal I Field R Liddle A Cobb J Logishetty K
Full Access

Anterior approach total hip arthroplasty (AA-THA) has a steep learning curve, with higher complication rates in initial cases. Proper surgical case selection during the learning curve can reduce early risk. This study aims to identify patient and radiographic factors associated with AA-THA difficulty using Machine Learning (ML). Consecutive primary AA-THA patients from two centres, operated by two expert surgeons, were enrolled (excluding patients with prior hip surgery and first 100 cases per surgeon). K- means prototype clustering – an unsupervised ML algorithm – was used with two variables - operative duration and surgical complications within 6 weeks - to cluster operations into difficult or standard groups. Radiographic measurements (neck shaft angle, offset, LCEA, inter-teardrop distance, Tonnis grade) were measured by two independent observers. These factors, alongside patient factors (BMI, age, sex, laterality) were employed in a multivariate logistic regression analysis and used for k-means clustering. Significant continuous variables were investigated for predictive accuracy using Receiver Operator Characteristics (ROC). Out of 328 THAs analyzed, 130 (40%) were classified as difficult and 198 (60%) as standard. Difficult group had a mean operative time of 106mins (range 99–116) with 2 complications, while standard group had a mean operative time of 77mins (range 69–86) with 0 complications. Decreasing inter-teardrop distance (odds ratio [OR] 0.97, 95% confidence interval [CI] 0.95–0.99, p = 0.03) and right-sided operations (OR 1.73, 95% CI 1.10–2.72, p = 0.02) were associated with operative difficulty. However, ROC analysis showed poor predictive accuracy for these factors alone, with area under the curve of 0.56. Inter-observer reliability was reported as excellent (ICC >0.7). Right-sided hips (for right-hand dominant surgeons) and decreasing inter-teardrop distance were associated with case difficulty in AA-THA. These data could guide case selection during the learning phase. A larger dataset with more complications may reveal further factors


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_2 | Pages 19 - 19
2 Jan 2024
Castagno S Birch M van der Schaar M McCaskie A
Full Access

Precision health aims to develop personalised and proactive strategies for predicting, preventing, and treating complex diseases such as osteoarthritis (OA). Due to OA heterogeneity, which makes developing effective treatments challenging, identifying patients at risk for accelerated disease progression is essential for efficient clinical trial design and new treatment target discovery and development. To create a reliable and interpretable precision health tool that predicts rapid knee OA progression over a 2-year period from baseline patient characteristics using an advanced automated machine learning (autoML) framework, “Autoprognosis 2.0”. All available 2-year follow-up periods of 600 patients from the FNIH OA Biomarker Consortium were analysed using “Autoprognosis 2.0” in two separate approaches, with distinct definitions of clinical outcomes: multi-class predictions (categorising disease progression into pain and/or radiographic progression) and binary predictions. Models were developed using a training set of 1352 instances and all available variables (including clinical, X-ray, MRI, and biochemical features), and validated through both stratified 10-fold cross-validation and hold-out validation on a testing set of 339 instances. Model performance was assessed using multiple evaluation metrics. Interpretability analyses were carried out to identify important predictors of progression. Our final models yielded higher accuracy scores for multi-class predictions (AUC-ROC: 0.858, 95% CI: 0.856-0.860) compared to binary predictions (AUC-ROC: 0.717, 95% CI: 0.712-0.722). Important predictors of rapid disease progression included WOMAC scores and MRI features. Additionally, accurate ML models were developed for predicting OA progression in a subgroup of patients aged 65 or younger. This study presents a reliable and interpretable precision health tool for predicting rapid knee OA progression. Our models provide accurate predictions and, importantly, allow specific predictors of rapid disease progression to be identified. Furthermore, the transparency and explainability of our methods may facilitate their acceptance by clinicians and patients, enabling effective translation to clinical practice


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_1 | Pages 140 - 140
2 Jan 2024
van der Weegen W Warren T Agricola R Das D Siebelt M
Full Access

Artificial Intelligence (AI) is becoming more powerful but is barely used to counter the growth in health care burden. AI applications to increase efficiency in orthopedics are rare. We questioned if (1) we could train machine learning (ML) algorithms, based on answers from digitalized history taking questionnaires, to predict treatment of hip osteoartritis (either conservative or surgical); (2) such an algorithm could streamline clinical consultation. Multiple ML models were trained on 600 annotated (80% training, 20% test) digital history taking questionnaires, acquired before consultation. Best performing models, based on balanced accuracy and optimized automated hyperparameter tuning, were build into our daily clinical orthopedic practice. Fifty patients with hip complaints (>45 years) were prospectively predicted and planned (partly blinded, partly unblinded) for consultation with the physician assistant (conservative) or orthopedic surgeon (operative). Tailored patient information based on the prediction was automatically sent to a smartphone app. Level of evidence: IV. Random Forest and BernoulliNB were the most accurate ML models (0.75 balanced accuracy). Treatment prediction was correct in 45 out of 50 consultations (90%), p<0.0001 (sign and binomial test). Specialized consultations where conservatively predicted patients were seen by the physician assistant and surgical patients by the orthopedic surgeon were highly appreciated and effective. Treatment strategy of hip osteoartritis based on answers from digital history taking questionnaires was accurately predicted before patients entered the hospital. This can make outpatient consultation scheduling more efficient and tailor pre-consultation patient education


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_1 | Pages 141 - 141
2 Jan 2024
Wendlandt R Volpert T Schroeter J Schulz A Paech A
Full Access

Gait analysis is an indispensable tool for scientific assessment and treatment of individuals whose ability to walk is impaired. The high cost of installation and operation are a major limitation for wide-spread use in clinical routine. Advances in Artificial Intelligence (AI) could significantly reduce the required instrumentation. A mobile phone could be all equipment necessary for 3D gait analysis. MediaPipe Pose provided by Google Research is such a Machine Learning approach for human body tracking from monocular RGB video frames that is detecting 3D-landmarks of the human body. Aim of this study was to analyze the accuracy of gait phase detection based on the joint landmarks identified by the AI system. Motion data from 10 healthy volunteers walking on a treadmill with a fixed speed of 4.5km/h (Callis, Sprintex, Germany) was sampled with a mobile phone (iPhone SE 2nd Generation, Apple). The video was processed with Mediapipe Pose (Version 0.9.1.0) using custom python software. Gait phases (Initial Contact - IC and Toe Off - TO) were detected from the angular velocities of the lower legs. For the determination of ground truth, the movement was simultaneously recorded with the AS-200 System (LaiTronic GmbH, Innsbruck, Austria). The number of detected strides, the error in IC detection and stance phase duration was calculated. In total, 1692 strides were detected from the reference system during the trials from which the AI-system identified 679 strides. The absolute mean error (AME) in IC detection was 39.3 ± 36.6 ms while the AME for stance duration was 187.6 ± 140 ms. Landmark detection is a challenging task for the AI-system as can clearly be seen be the rate of only 40% detected strides. As mentioned by Fadillioglu et al., error in TO-detection is higher than in IC-detection


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_1 | Pages 47 - 47
2 Jan 2024
Grammens J Pereira LF Danckaers F Vanlommel J Van Haver A Verdonk P Sijbers J
Full Access

Currently implemented accuracy metrics in open-source libraries for segmentation by supervised machine learning are typically one-dimensional scores [1]. While extremely relevant to evaluate applicability in clinics, anatomical location of segmentation errors is often neglected. This study aims to include the three-dimensional (3D) spatial information in the development of a novel framework for segmentation accuracy evaluation and comparison between different methods. Predicted and ground truth (manually segmented) segmentation masks are meshed into 3D surfaces. A template mesh of the same anatomical structure is then registered to all ground truth 3D surfaces. This ensures all surface points on the ground truth meshes to be in the same anatomically homologous order. Next, point-wise surface deviations between the registered ground truth mesh and the meshed segmentation prediction are calculated and allow for color plotting of point-wise descriptive statistics. Statistical parametric mapping includes point-wise false discovery rate (FDR) adjusted p-values (also referred to as q-values). The framework reads volumetric image data containing the segmentation masks of both ground truth and segmentation prediction. 3D color plots containing descriptive statistics (mean absolute value, maximal value,…) on point-wise segmentation errors are rendered. As an example, we compared segmentation results of nnUNet [2], UNet++ [3] and UNETR [4] by visualizing the mean absolute error (surface deviation from ground truth) as a color plot on the 3D model of bone and cartilage of the mean distal femur. A novel framework to evaluate segmentation accuracy is presented. Output includes anatomical information on the segmentation errors, as well as point-wise comparative statistics on different segmentation algorithms. Clearly, this allows for a better informed decision-making process when selecting the best algorithm for a specific clinical application