Abstract
Bone age is a radiographical assessment used in pediatric medicine due to its relative objectivity in determining biological maturity compared to chronological age and size.1 Currently, Greulich and Pyle (GP) is one of the most common methods used to determine bone age from hand radiographs.2–4 In recent years, new methods were developed to increase the efficiency in bone age analysis like the shorthand bone age (SBA) and the automated artificial intelligence algorithms. The purpose of this study is to evaluate the accuracy and reliability of these two methods and examine if the reduction in analysis time compromises their accuracy.
Two hundred thirteen males and 213 females were selected. Each participant had their bone age determined by two separate raters using the GP (M1) and SBA methods (M2). Three weeks later, the two raters repeated the analysis of the radiographs. The raters timed themselves using an online stopwatch while analyzing the radiograph on a computer screen. De-identified radiographs were securely uploaded to an automated algorithm developed by a group of radiologists in Toronto. The gold standard was determined to be the radiology report attached to each radiograph, written by experienced radiologists using GP (M1). For intra-rater variability, intraclass correlation analysis between trial 1 (T1) and trial 2 (T2) for each rater and method was performed. For inter-rater variability, intraclass correlation was performed between rater 1 (R1) and rater 2 (R2) for each method and trial.
Intraclass correlation between each method and the gold standard fell within the 0.8–0.9 range, highlighting significant agreement. Most of the comparisons showed a statistically significant difference between the two new methods and the gold standard; however it may not be clinically significant as it ranges between 0.25–0.5 years. A bone age is considered clinically abnormal if it falls outside 2 standard deviations of the chronological age; standard deviations are calculated and provided in GP atlas.6–8 For a 10-year old female, 2 standard deviations constitute 21.6 months which far outweighs the difference reported here between SBA, automated algorithm and the gold standard. The median time for completion using the GP method was 21.83 seconds for rater 1 and 9.30 seconds for rater 2. In comparison, SBA required a median time of 7 seconds for rater 1 and 5 seconds for rater 2. The automated method had no time restraint as bone age was determined immediately upon radiograph upload. The correlation between the two trials in each method and rater (i.e. R1M1T1 vs R1M1T2) was excellent (κ= 0.9–1) confirming the reliability of the two new methods. Similarly, the correlation between the two raters in each method and trial (i.e. R1M1T1 vs R2M1T1) fell within the 0.9–1 range. This indicates a limited variability between raters who may use these two methods.
The shorthand bone age method and an artificial intelligence automated algorithm produced values that are in agreement with the gold standard Greulich and Pyle, while reducing analysis time and maintaining a high inter-rater and intra-rater reliability.