|
|
Artificial intelligence learning semantics via external resources for classifying diagnosis codes in discharge notes
|
|
|
|
|
نویسنده
|
lin c. ,hsu c.-j. ,lou y.-s. ,yeh s.-j. ,lee c.-c. ,su s.-l. ,chen h.-c.
|
منبع
|
journal of medical internet research - 2017 - دوره : 19 - شماره : 11
|
چکیده
|
Background: automated disease code classification using free-text medical information is important for public health surveillance. however,traditional natural language processing (nlp) pipelines are limited,so we propose a method combining word embedding with a convolutional neural network (cnn). objective: our objective was to compare the performance of traditional pipelines (nlp plus supervised machine learning models) with that of word embedding combined with a cnn in conducting a classification task identifying international classification of diseases,tenth revision,clinical modification (icd-10-cm) diagnosis codes in discharge notes. methods: we used 2 classification methods: (1) extracting from discharge notes some features (terms,n-gram phrases,and snomed ct categories) that we used to train a set of supervised machine learning models (support vector machine,random forests,and gradient boosting machine),and (2) building a feature matrix,by a pretrained word embedding model,that we used to train a cnn. we used these methods to identify the chapter-level icd-10-cm diagnosis codes in a set of discharge notes. we conducted the evaluation using 103,390 discharge notes covering patients hospitalized from june 1,2015 to january 31,2017 in the tri-service general hospital in taipei,taiwan. we used the receiver operating characteristic curve as an evaluation measure,and calculated the area under the curve (auc) and f-measure as the global measure of effectiveness. results: in 5-fold cross-validation tests,our method had a higher testing accuracy (mean auc 0.9696; mean f-measure 0.9086) than traditional nlp-based approaches (mean auc range 0.8183-0.9571; mean f-measure range 0.5050-0.8739). a real-world simulation that split the training sample and the testing sample by date verified this result (mean auc 0.9645; mean f-measure 0.9003 using the proposed method). further analysis showed that the convolutional layers of the cnn effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes. conclusions: word embedding combined with a cnn showed outstanding performance compared with traditional methods,needing very little data preprocessing. this shows that future studies will not be limited by incomplete dictionaries. a large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future,and we believe that the health care field is about to enter the age of big data.
|
کلیدواژه
|
Convolutional neural network; Data mining; Electronic health records; Electronic medical records; Machine learning; Natural language processing; Neural networks (computer); Text mining; Word embedding
|
آدرس
|
school of public health,national defense medical center,taipei,taiwan,department of research and development,national defense medical center,taipei, Taiwan, planning and management office,tri-service general hospital,national defense medical center,taipei, Taiwan, school of public health,national defense medical center,taipei, Taiwan, da-yeh university,changhua, Taiwan, planning and management office,tri-service general hospital,national defense medical center,taipei, Taiwan, school of public health,national defense medical center,taipei, Taiwan, division of rheumatology/immunology/allergy,department of internal medicine,tri-service general hospital,national defense medical center,no.161,min-chun e. rd.,sec. 6,neihu,taipei,114, Taiwan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|