Dominika Tkaczyk

Domain-Independent Semantic Annotation of the Text

Surrounded by huge and exponentially growing volume of information of various nature, every day we face challenges with keeping track of the latest news or finding good-quality answers to specific questions. The problem of information overload has been addressed by modern information systems and search engines, however, their capabilities are strongly limited by exchanging and storing information in textual formats, which are still poorly understood by the machines.

During my EDGE fellowship, I plan to develop a comprehensive framework for semantic annotation of textual documents of arbitrary domains, such as scientific papers, legal documents, customer reviews or clinical trial reports. Using state of the art natural language processing and machine learning techniques, the resulting system will allow for discovering entity and relation types in a given domain and training tailor-made annotation tools.

DISCANT project will contribute to the release of important knowledge locked in textual documents, the creation of new knowledge repositories and more effective solutions for semantic search and personal recommendations. As a consequence, the readers of textual documents will be equipped with better tools for overcoming information overload and making better-quality, data-driven decisions.