2 |
Exploring Construction of a Company Domain-Specific Knowledge Graph from Financial Texts Using Hybrid Information Extraction
|
|
Jen, Chun-Heng. - : KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021
|
|
BASE
|
|
Show details
|
|
3 |
Analyzing Non-Textual Content Elements to Detect Academic Plagiarism
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Multi-Word Terminology Extraction and Its Role in Document Embedding
|
|
|
|
In: Electronic Theses and Dissertations (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Linguistic Analysis and Automatic Information Extraction of Semantic Relations in Arabic ; Analyse linguistique et extraction automatique de relations sémantiques des textes en arabe
|
|
|
|
In: https://hal.archives-ouvertes.fr/tel-03572307 ; Linguistique. Université Bourgogne Franche-Comté, 2020. Français (2020)
|
|
BASE
|
|
Show details
|
|
6 |
Prerequisites for Extracting Entity Relations from Swedish Texts
|
|
Lenas, Erik. - : KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020
|
|
BASE
|
|
Show details
|
|
7 |
Cold-start universal information extraction
|
|
|
|
Abstract:
Who? What? When? Where? Why? are fundamental questions asked when gathering knowledge about and understanding a concept, topic, or event. The answers to these questions underpin the key information conveyed in the overwhelming majority, if not all, of language-based communication. At the core of my research in Information Extraction (IE) is the desire to endow machines with the ability to automatically extract, assess, and understand text in order to answer these fundamental questions. IE has been serving as one of the most important components for many downstream natural language processing (NLP) tasks, such as knowledge base completion, machine reading comprehension, machine translation and so on. The proliferation of the Web also intensifies the need of dealing with enormous amount of unstructured data from various sources, such as languages, genres and domains. When building an IE system, the conventional pipeline is to (1) ask expert linguists to rigorously define a target set of knowledge types we wish to extract by examining a large data set, (2) collect resources and human annotations for each type, and (3) design features and train machine learning models to extract knowledge elements. In practice, this process is very expensive as each step involves extensive human effort which is not always available, for example, to specify the knowledge types for a particular scenario, both consumers and expert linguists need to examine a lot of data from that domain and write detailed annotation guidelines for each type. Hand-crafted schemas, which define the types and complex templates of the expected knowledge elements, often provide low coverage and fail to generalize to new domains. For example, none of the traditional event extraction programs, such as ACE (Automatic Content Extraction) and TAC-KBP, include "donation'' and "evacuation'' in their schemas in spite of their potential relevance to natural disaster management users. Additionally, these approaches are highly dependent on linguistic resources and human labeled data tuned to pre-defined types, so they suffer from poor scalability and portability when moving to a new language, domain, or genre. The focus of this thesis is to develop effective theories and algorithms for IE which not only yield satisfactory quality by incorporating prior linguistic and semantic knowledge, but also greater portability and scalability by moving away from the high cost and narrow focus of large-scale manual annotation. This thesis opens up a new research direction called Cold-Start Universal Information Extraction, where the full extraction and analysis starts from scratch and requires little or no prior manual annotation or pre-defined type schema. In addition to this new research paradigm, we also contribute effective algorithms and models towards resolving the following three challenges: How can machines extract knowledge without any pre-defined types or any human annotated data? We develop an effective bottom-up and unsupervised Liberal Information Extraction framework based on the hypothesis that the meaning and underlying knowledge conveyed by linguistic expressions is usually embodied by their usages in language, which makes it possible to automatically induces a type schema based on rich contextual representations of all knowledge elements by combining their symbolic and distributional semantics using unsupervised hierarchical clustering. How can machines benefit from available resources, e.g., large-scale ontologies or existing human annotations? My research has shown that pre-defined types can also be encoded by rich contextual or structured representations, through which knowledge elements can be mapped to their appropriate types. Therefore, we design a weakly supervised Zero-shot Learning and a Semi-Supervised Vector Quantized Variational Auto-Encoder approach that frames IE as a grounding problem instead of classification, where knowledge elements are grounded into any types from an extensible and large-scale target ontology or induced from the corpora, with available annotations for a few types. How can IE approaches be extent to low-resource languages without any extra human effort? There are more than 6000 living languages in the real world while public gold-standard annotations are only available for a few dominant languages. To facilitate the adaptation of these IE frameworks to other languages, especially low resource languages, a Multilingual Common Semantic Space is further proposed to serve as a bridge for transferring existing resources and annotated data from dominant languages to more than 300 low resource languages. Moreover, a Multi-Level Adversarial Transfer framework is also designed to learn language-agnostic features across various languages.
|
|
Keyword:
Common Semantic Space; Event Extraction; Information Extraction; Zero-Shot Learning
|
|
URL: http://hdl.handle.net/2142/109351
|
|
BASE
|
|
Hide details
|
|
8 |
Focus Particles and Extraction – An Experimental Investigation of German and English Focus Particles in Constructions with Leftward Association
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Extracting Global Entities Information from News
|
|
Xia, Chen. - : eScholarship, University of California, 2019
|
|
In: Xia, Chen. (2019). Extracting Global Entities Information from News. UCLA: Computer Science 0201. Retrieved from: http://www.escholarship.org/uc/item/0bv836gm (2019)
|
|
BASE
|
|
Show details
|
|
10 |
NormCo: Deep Disease Normalization for Biomedical Knowledge Base Construction
|
|
|
|
In: Wright, Dustin. (2019). NormCo: Deep Disease Normalization for Biomedical Knowledge Base Construction. UC San Diego: Computer Science and Engineering. Retrieved from: http://www.escholarship.org/uc/item/3410q7zk (2019)
|
|
BASE
|
|
Show details
|
|
11 |
Biomedical Literature Mining and Knowledge Discovery of Phenotyping Definitions
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Contextual citation recommendation using scientific discourse annotation schemes
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Informationsextraktion aus Wirtschaftsnachrichten über Unternehmenszusammenschlüsse mit lokalen Grammatiken
|
|
|
|
IDS Mannheim
|
|
Show details
|
|
14 |
Personnalisation et enrichissement des méthodes d’accès aux données
|
|
|
|
In: https://hal.inria.fr/tel-01739707 ; Base de données [cs.DB]. Université Rennes 1, 2018 (2018)
|
|
BASE
|
|
Show details
|
|
15 |
SMS communication : Natural language processing and information extraction ; Communiquer par SMS : Analyse automatique du langage et extraction de l'information véhiculée
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-01968698 ; Linguistique. Université Grenoble Alpes, 2018. Français. ⟨NNT : 2018GREAL012⟩ (2018)
|
|
BASE
|
|
Show details
|
|
16 |
Deep Text Mining of Instagram Data Without Strong Supervision ; Textutvinning från Instagram utan Precis Övervakning
|
|
Hammar, Kim. - : KTH, Programvaruteknik och datorsystem, SCS, 2018
|
|
BASE
|
|
Show details
|
|
17 |
A Step Toward GDPR Compliance : Processing of Personal Data in Email
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Building effective representations for domain adaptation in coreference resolution
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Analysis of social health media to assess the quality of life of breast cancer patients ; Analyse des médias sociaux de santé pour évaluer la qualité de vie des patientes atteintes d’un cancer du sein
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-01919773 ; Autres [stat.ML]. Université Montpellier, 2017. Français. ⟨NNT : 2017MONTS039⟩ (2017)
|
|
BASE
|
|
Show details
|
|
|
|