We studied 19 videotaped knee arthroscopies in 19 patients with mild to moderate osteoarthritis (OA) of the knee in order to compare the intraobserver and interobserver reliability and the patterns of disagreement between four orthopaedic surgeons. The classifications of OA of Collins, Outerbridge and the French Society of Arthroscopy were used. Intraobserver and interobserver agreements using kappa measures were 0.42 to 0.66 and 0.43 to 0.49, respectively. Only 6% to 8% of paired intraobserver classifications differed by more than one category. Observer-specific disagreement was evident both within and between observers. A small, but significant, occasional variation was also seen. Although reliability may improve by an analysis of disagreement, it appears that the arthroscopic grading of early osteoarthritic lesions is inexact.