1. Coorevits P, Sundgren M, Klein GO, Bahr A, Claerhout B, Daniel C, et al. Electronic health records: new opportunities for clinical research.
J Int Med 2013;274:547-560.
2. Jeurgens C. Threats of the data-flood: an accountability perspective in the era of ubiquitous computing. In: Smit F, Glaudemans A, Jonker R, eds. Archives in liquid times. s-Gravenhage: Stichting Archiefpublicaties, 2017;196-210.
3. Adnan K, Akbar R, Khor SW, Ali ABA. Role and challenges of unstructured big data in healthcare. In: Sharma N, Chakrabarti A, Balas V, eds. Data management, analytics and innovation. advances in intelligent systems and computing. Vol 1042. Singapore: Springer, 2020;301-323.
4. Sun W, Cai Z, Li Y, Liu F, Fang S, Wang G. Data processing and text mining technologies on electronic medical records: a review.
J Healthc Eng 2018;2018:4302425.
5. Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep learning based natural language processing. IEEE Comput Intelli Mag 2018;13:55-75.
6. Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction.
J Am Med Inform Assoc 2011;18:544-551.
7. Rajman M, Besançon R. Text mining: natural language techniques and text mining applications. In: Spaccapietra S, Maryanski F, eds. Data mining and reverse engineering. Boston: Springer, 1998;50-64.
8. Moreno A, Redondo T. Text analytics: the convergence of big data and artificial intelligence.
IJIMAI 2016;3:57-64.
9. Nikiforou A, Ponirou P, Diomidous M. Medical data analysis and coding using natural language processing techniques in order to derive structured data information. In: Mantas J, Hasman A, eds. Informatics, management and technology in healthcare. Amsterdam: IOS Press, 2013;53-55.
10. Buchlak QD, Esmaili N, Leveque JC, Farrokhi F, Bennett C, Piccardi M, et al. Machine learning applications to clinical decision support in neurosurgery: an artificial intelligence augmented systematic review.
Neurosurg Rev 2020;43:1235-1253.
11. Tan AH. Text mining: the state of the art and the challenges. Proceedings of the PAKDD Workshop on Knowledge Discovery from Advanced Databases. 1999;Beijing, China: 65-70.
12. Brants T. Natural language processing in information retrieval. Proceedings of CLIN 2003. 2003;Antwerp, Belgium: Netherland: Schloss Dagstuhl - Leibniz Center for Informatics, 2004;1-13.
13. Vijayarani S, Ilamathi MJ, Nithya M. Preprocessing techniques for text mining-an overview. Int J Comput Sci Commun Netw 2015;5:7-16.
14. HaCohen-Kerner Y, Miller D, Yigal Y. The influence of preprocessing on text classification using a bag-of-words representation.
PLoS One 2020;15:e0232525.
15. Willett P. The porter stemming algorithm: then and now.
Program 2006;40:219-223.
16. Ferilli S, Esposito F, Grieco D. Automatic learning of linguistic resources for stopword removal and stemming from text.
Procedia Comput Sci 2014;38:116-123.
17. Gehrmann S, Dernoncourt F, Li Y, Carlson ET, Wu JT, Welt J, et al. Comparing rule-based and deep learning models for patient phenotyping. [online] [cited 2021 Mar 24]. Available from: URL:
https://arxiv.org/abs/1703.08705.
18. Eftimov T, Koroušić Seljak B, Korošec P. A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations.
PLoS One 2017;12:e0179488.
20. Chiang JH, Lin JW, Yang CW. Automated evaluation of electronic discharge notes to assess quality of care for cardiovascular diseases using Medical Language Extraction and Encoding System (MedLEE).
J Am Med Inform Assoc 2010;17:245-252.
21. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications.
J Am Med Inform Assoc 2010;17:507-513.
22. Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proceedings of the AMIA Symposium. 2001;Washington, DC: Oxford: Oxford University Press, 2014;17.
23. Jing L, Ng MK, Huang JZ. Knowledge-based vector space model for text clustering.
Know Inform Syst 2010;25:35-55.
24. Mihalcea R, Corley C, Strapparava C. Corpus-based and knowledge-based measures of text semantic similarity. Aaai 2006;6:775-780.
25. Zhang Y, Jin R, Zhou ZH. Understanding bag-of-words model: a statistical framework.
Int J Mach Learn Cybernet 2010;1:43-52.
26. Voorhees EM. Natural language processing and information retrieval. In: Pazienza MT, ed. Information extraction. Heidelberg: Springer, 1999;32-48.
27. Cavnar WB, Trenkle JM. N-gram-based text categorization. Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval. 1994;Las Vegas, NV: University of Nevada, 1994;161-175.
28. Goldberg Y, Levy O. word2vec explained: deriving Mikolov et al.'s negative-sampling word-embedding method. [online] [cited 2021 Mar 24]. Available from: URL:
https://arxiv.org/abs/1402.3722.
29. Bianchi B, Monzón GB, Ferrer L, Slezak DF, Shalom DE, Kamienkowski JE. Human and computer estimations of predictability of words in written language.
Sci Rep 2020;10:1-11.
30. Liu Q, Huang HY, Gao Y, Wei X, Tian Y, Liu L. Task-oriented word embedding for text classification. Proceedings of the 27th international conference on computational linguistics. 2018;Santa Fe, NM: Cambridge: Massachusetts Institute of Technology Press, 2018;2023-2032.
31. Cer D, Yang Y, Kong SY, Hua N, Limtiaco N, John RS, et al. Universal sentence encoder. [online] [cited 2021 Mar 24]. Available from: URL:
https://arxiv.org/abs/1803.11175.
32. Kang N, Singh B, Afzal Z, van Mulligen EM, Kors JA. Using rule-based natural language processing to improve disease normalization in biomedical text.
J Am Med Inform Assoc 2013;20:876-881.
33. Razno M. Machine learning text classification model with NLP approach. Comput Ling Intellig Syst 2019;2:71-73.
34. Kanakaraj M, Guddeti RMR. Performance analysis of ensemble methods on twitter sentiment analysis using NLP techniques. Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015). 2015;New York, NY: Institute of Electrical and Electronics Engineers;. 2015;Anaheim, CA. New York, NY: Institute of Electrical and Electronics Engineers, 2015;169-170.
35. Li J, Chen X, Hovy E, Jurafsky D. Visualizing and understanding neural models in nlp. [online] [cited 2021 Mar 23]. Available from: URL:
https://arxiv.org/abs/1506.01066.
36. Li H. Deep learning for natural language processing: advantages and challenges.
Nat Sci Rev 2017;5:24-26.
37. Vosoughi S, Vijayaraghavan P, Roy D. Tweet2vec: learning tweet embeddings using character-level cnn-lstm encoder-decoder. Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 2016;Pisa, Italy. New York, NY: Association for Computing Machinery, 2016;1041-1044.
38. Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G. Coding neuroradiology reports for the Northern Manhattan Stroke Study: a comparison of natural language processing and manual review.
Comput Biomed Res 2000;33:1-10.
39. Pons E, Braun LM, Hunink MM, Kors JA. Natural language processing in radiology: a systematic review.
Radiology 2016;279:329-343.
40. Mowery DL, Chapman BE, Conway M, South BR, Madden E, Keyhani S, et al. Extracting a stroke phenotype risk factor from Veteran Health Administration clinical reports: an information content analysis.
J Biomed Seman 2016;7:1-12.
41. Kim C, Zhu V, Obeid J, Lenert L. Natural language processing and machine learning algorithm to identify brain MRI reports with acute ischemic stroke.
PLoS One 2019;14:e0212778.
42. Li MD, Lang M, Deng F, Chang K, Buch K, Rincon S, et al. Analysis of stroke detection during the COVID-19 pandemic using natural language processing of radiology reports.
AJNR Am J Neuroradiol 2021;42:429-434.
43. Ong CJ, Orfanoudaki A, Zhang R, Caprasse FP, Hutch M, Ma L, et al. Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports.
PLoS One 2020;15:e0234908.
44. Alex B, Grover C, Tobin R, Sudlow C, Mair G, Whiteley W. Text mining brain imaging reports.
J Biomed Seman 2019;10:1-11.
45. Wheater E, Mair G, Sudlow C, Alex B, Grover C, Whiteley W. A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records.
BMC Med Inform Decis Mak 2019;19:1-11.
46. Fu S, Leung LY, Wang Y, Raulli AO, Kallmes DF, Kinsman KA, et al. Natural language processing for the identification of silent brain infarcts from neuroimaging reports.
JMIR Med Inform 2019;7:e12109.
47. Bacchi S, Oakden-Rayner L, Zerner T, Kleinig T, Patel S, Jannes J. Deep learning natural language processing successfully predicts the cerebrovascular cause of transient ischemic attack-like presentations.
Stroke 2019;50:758-760.
48. Garg R, Oh E, Naidech A, Kording K, Prabhakaran S. Automating ischemic stroke subtype classification using machine learning and natural language processing.
J Stroke Cerebrovasc Dis 2019;28:2045-2051.
49. Sung SF, Lin CY, Hu YH. EMR-based phenotyping of ischemic stroke using supervised machine learning and text mining techniques.
IEEE J Biomed Health Inform 2020;24:2922-2931.
50. Sung SF, Chen KC, Wu DP, Hung LC, Su YH, Hu YH. Applying natural language processing techniques to develop a task-specific EMR interface for timely stroke thrombolysis: a feasibility study.
Int J Med Inform 2018;112:149-157.
51. Kogan E, Twyman K, Heap J, Milentijevic D, Lin JH, Alberts M. Assessing stroke severity using electronic health record data: a machine learning approach.
BMC Med Inform Decis Mak 2020;20:8.
52. Heo TS, Kim YS, Choi JM, Jeong YS, Seo SY, Lee JH, et al. Prediction of stroke outcome using natural language processing-based machine learning of radiology report of brain MRI.
J Pers Med 2020;10:286.
53. Cawley GC, Talbot NL. On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 2010;11:2079-2107.
54. Guan W, Ko D, Khurshid S, Trisini Lipsanopoulos AT, Ashburner JM, Harrington LX, et al. Automated electronic phenotyping of cardioembolic stroke.
Stroke 2021;52:181-189.