1 |
Towards Parallel Algorithms for Abstract Dialectical Frameworks ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Detecting Signal Corruptions in Voice Recordings for Speech Therapy ; Igenkänning av Signalproblem i Röstinspelningar för Logopedi
|
|
Nylén, Helmer. - : KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021
|
|
BASE
|
|
Show details
|
|
3 |
Noise-skipping Earley parsing and in-order tree extraction from shared packed parse forests
|
|
|
|
Abstract:
In this thesis, I identify 3 shortcomings of state of the art Earley parsing and offer a unified, end-to-end Earley parsing algorithm which remedies these shortcomings. In particular I address the following issues: 1. The Earley algorithm traditionally returns parses without any rank-ordering associated with them. In addition to not ranking the parses during parse time, the data structure used to represent parse results, the shared-packed parse forest (SPPF), doesn’t have any intrinsic way to extract trees in a particular order. This is a major shortcoming because, for all but the most trivial grammars and inputs, the number of potential parses for an input is intractably large, making the parser effectively useless barring some external mechanism to filter results. 2. The Earley algorithm requires that parses explain contiguous spans of tokens without any unexplained out-of-vocabulary items or tokens interceding. In other words, traditional Earley parsing parses only whole strings of tokens, as opposed to permitting parsing only discontinuous subsequences of the input tokens. This is a shortcoming in applications where the input contains spurious tokens or where the grammar is only intended to describe a subset of the input. 3. The Earley algorithm’s run time is proportional to grammar size. Current state of the art methods substantially restrict the size of grammars which can be practically parsed with reasonable memory and time constraints. This is a major shortcoming because it can be difficult to represent sufficiently expressive languages for many applications using anything short of a massive grammar. The narrative arc of this thesis can be understood as an attempt to redress the three issues above by extending the Earley algorithm to enable parse ranking and quick extraction, make it robust to noise skipping, and make parsing with massive grammars tractable. In chapter two I address issue 1 by introducing a framework for ranking parses as a function of their intrinsic attributes. I present this approach as an extension of the work done in [41] and [21]. [41] introduces parsing as a form of deductive reasoning on a logical system defined by a context-free grammar, while [21] unifies a number of concepts from deductive parsing (e.g. recognition, assigning probabilities to parses, etc.) as performing calculations on a suitably defined semiring. Thus chapter two introduces a semiring I refer to as the ‘attribute semiring’ and show how Earley parsing can be used to derive the attributes of the best derivations for Earley states - where “best” is evaluated by applying a utility function f (·) to a state’s attributes. I use the semiring formalism to show which types of attributes and utility functions can be used for this purpose. In chapters three and four, I address the first two issues above by introducing extensions to the standard Earley parser and standard SPPF-construction algorithm to handle skipping tokens in noisy inputs as well innovations to associate each Earley state and SPPF node with the attributes of its best derivation. In chapter 3, I prove that the modified noise-skipping parsing algorithm runs in O(w · n 2 ) time where w is the number of tokens allowed to be skipped between any explained tokens, also referred to as the ‘skip width’. Chapter four details the construction of a novel variant of the SPPF called an ordered-SPPF which, by design, permits us to associate the best derivation attributes with SPPF nodes and makes extracting the best tree from a forest trivially easy. In that chapter, I show that construction of the ordered-SPPF can be done in O(w 2 n 3 · log(wn)) time In chapter five, I address issue two, by showing that the ordered SPPF from chapter four can be used to not only efficiently extract the best tree in the forest but also the top k parses ranked by applying the utility function to their attributes. I show my efficient top-k algorithm can extract the top k parses in O(kn 2 ·log(kn) time. In appendix A, I address the third issue, by showing an algorithm to filter grammars at run time in order to make large-grammar parsing more efficient. Finally, in appendix B, I introduce an extension that allows the Earley parser to run continuously and return ranked parses in a streamed manner. This extension requires nontrivial modifications to the core logic in order to maintain memory efficiency while not losing any valid parses. Thus, this thesis details a unified approach permitting the Earley algorithm to parse valid subsequences of noisy inputs while using massive grammars. The system detailed here is also capable of ranking the parses by customized utility functions over user-defined attributes with the constraints governing such functions detailed in chapter 2.
|
|
Keyword:
algorithms; Computer science; data structures; Earley parsing; Linguistics; parse ranking; semiring parsing; shared packed parse forest
|
|
URL: https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37367424
|
|
BASE
|
|
Hide details
|
|
4 |
Temporal Social Network Analysis using Harvard Caselaw Access Project
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Going beyond our means: A proposal for improving psycholinguistic methods
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Extracting Human Behaviour and Personality Traits from Social Media
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Towards provably efficient algorithms for learning neural networks ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Factions: acts of worldbuilding on social media platforms ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Understanding and generating language with abstract meaning representation
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Detección de linguaxe misóxino e xenófobo en redes sociais mediante aprendizaxe máquina
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Towards provably efficient algorithms for learning neural networks
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Emoción, percepción, producción: un estudio psicolingüístico para detectar emociones en el habla
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Language Recognition in the Sliding Window Model ... : Formale Sprachen im Sliding-Window-Modell ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Seguridad del paciente: estudio de factores para su consecución
|
|
|
|
In: TDX (Tesis Doctorals en Xarxa) (2019)
|
|
BASE
|
|
Show details
|
|
18 |
Fast machine translation on parallel and massively parallel hardware
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Preference inference based on lexicographic and Pareto models
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Seguridad del paciente: estudio de factores para su consecución
|
|
|
|
BASE
|
|
Show details
|
|
|
|