80% of health data is recorded as free text and not easily accessible for use in research and QI. Natural Language Processing (NLP) could be used as a method to abstract data easier than manual methods. Our objectives were to investigate whether NLP can be used to abstract structured clinical data from notes for total joint arthroplasty (TJA). Clinical and hospital notes were collected for every patient undergoing a primary TJA. Human annotators reviewed a random training sample(n=400) and test sample(n=600) of notes from 6 different surgeons and manually abstracted historical, physical exam, operative, and outcomes data to create a gold standard dataset. Historical data collected included pain information and the various treatments tried (medications, injections, physical therapy). Physical exam information collected included ROM and the presence of deformity. Operative information included the angle of tibial slope, angle of tibial and femoral cuts, and patellar tracking for TKAs and approach and repair of external rotators for THAs. In addition, information on implant brand/type/size, sutures, and drains were collected for all TJAs. Finally, the occurrence of complications was collected. We then trained and tested our NLP system to automatically collect the respective variables. Finally, we assessed our automated approach by comparing system-generated findings against the gold standard.Background
Methods