Abstract
Ultrasonography of the hip was performed sequentially by two different examiners in 75 infants. The ultrasound strips were reviewed twice by three paediatric orthopaedic surgeons and classified by the Graf method. The intraobserver and interobserver agreement between the interpretations was analysed using simple and weighted kappa coefficients calculated for agreement on the Graf classification and for grouping as normal (types 1A to 2A), and abnormal requiring treatment (types 2B to 4).
When examining the same ultrasound strip, intraobserver agreement for the Graf classification was substantial (mean kappa 0.61), but interobserver agreement was only moderate (kappa 0.50).
For the grouping into normal and abnormal, the mean kappa value for intraobserver agreement was 0.67 and for interobserver agreement 0.57. Because of the significant differences in agreement between normal and abnormal hips, we analysed a subgroup of those with at least one abnormal interpretation. Intraobserver agreement within this subgroup showed moderate reliability (kappa 0.41), but interobserver agreement was only fair (kappa 0.28).
Interpretations of two different strips performed sequentially showed significantly lower agreement with an intraobserver kappa value of 0.29 and an interobserver value of 0.28. In the subgroup with at least one abnormal reading, the intraobserver kappa was 0.09 and the interobserver 0.1.
Our findings suggest that both the technique of performing ultrasonography and the interpretation of the image may influence the result.