Human language is relatively complex for machine learning and deep learning algorithms to understand. In order to use it for various applications, the human language must be made comprehensible for the machine. Annotation is a critical part of this, where we associate labels to these texts.
- Ankan Ghosh, Data Scientist, Ideas2IT
Healthcare is one of the fields that use many text data to develop several domain-specific use cases and applications. Functionalities like extracting clinical entities from text, establishing relations, and mapping of ICD codes, to name a few, are essential for modern healthcare applications.
Machine learning in healthcare has been helping build applications that fulfill these requirements efficiently these texts exist in an unstructured way in clinical notes by physicians, discharge summaries, etc. sometimes most of the vital information in these notes gets neglected and may sometimes be available in the eHR.
To extract these texts and convert them into meaningful information is where the process of clinical data annotation plays an important role. Text Mining (TM) techniques can support curators by automatically detecting complex information, such as interactions between drugs, diseases, and adverse effects. this semantic information supports the quick identification of documents containing information of interest.
Any general annotation technique trained on available corpora of text data fails to identify and understand several technical and domain-specific jargon in these clinical texts. spark NLP for healthcare by John Snow Labs helps solve this problem by creating several state-of-art techniques whereby they train these annotation models on big medical corpora and use it for standard NLP practices like NeR, Question & Answering, entity resolution, etc. On the other hand, scispaCy uses a full spaCy pipeline and models for scientific/biomedical document annotation. Let’s look at them in detail.
Read the full article on pages 34-38 of PCQuest May 2021 Edition.