Natural Language Processing (NLP) offers an automated method to extract data from unstructured free text fields for arthroplasty registry participation. Our objective was to investigate how accurately NLP can be used to extract structured clinical data from unstructured clinical notes when compared with manual data extraction. A group of 1,000 randomly selected clinical and hospital notes from eight different surgeons were collected for patients undergoing primary arthroplasty between 2012 and 2018. In all, 19 preoperative, 17 operative, and two postoperative variables of interest were manually extracted from these notes. A NLP algorithm was created to automatically extract these variables from a training sample of these notes, and the algorithm was tested on a random test sample of notes. Performance of the NLP algorithm was measured in Statistical Analysis System (SAS) by calculating the accuracy of the variables collected, the ability of the algorithm to collect the correct information when it was indeed in the note (sensitivity), and the ability of the algorithm to not collect a certain data element when it was not in the note (specificity).Aims
Methods
80% of health data is recorded as free text and not easily accessible for use in research and QI. Natural Language Processing (NLP) could be used as a method to abstract data easier than manual methods. Our objectives were to investigate whether NLP can be used to abstract structured clinical data from notes for total joint arthroplasty (TJA). Clinical and hospital notes were collected for every patient undergoing a primary TJA. Human annotators reviewed a random training sample(n=400) and test sample(n=600) of notes from 6 different surgeons and manually abstracted historical, physical exam, operative, and outcomes data to create a gold standard dataset. Historical data collected included pain information and the various treatments tried (medications, injections, physical therapy). Physical exam information collected included ROM and the presence of deformity. Operative information included the angle of tibial slope, angle of tibial and femoral cuts, and patellar tracking for TKAs and approach and repair of external rotators for THAs. In addition, information on implant brand/type/size, sutures, and drains were collected for all TJAs. Finally, the occurrence of complications was collected. We then trained and tested our NLP system to automatically collect the respective variables. Finally, we assessed our automated approach by comparing system-generated findings against the gold standard.Background
Methods