DE eng

Search in the Catalogues and Directories

Hits 1 – 15 of 15

1
MarsaTag, a tagger for French written texts and speech transcriptions
In: Second Asian Pacific Corpus linguistics Conference ; https://hal.archives-ouvertes.fr/hal-01500736 ; Second Asian Pacific Corpus linguistics Conference, Mar 2014, Hong Kong, China. pp.220-220 (2014)
BASE
Show details
2
Phrase extraction and rescoring in statistical machine translation
Srivastava, Ankit Kumar. - : Dublin City University. Centre for Next Generation Localisation (CNGL), 2014. : Dublin City University. School of Computing, 2014
In: Srivastava, Ankit Kumar (2014) Phrase extraction and rescoring in statistical machine translation. PhD thesis, Dublin City University. (2014)
BASE
Show details
3
Deep Syntax Annotation of the Sequoia French Treebank
In: International Conference on Language Resources and Evaluation (LREC) ; https://hal.inria.fr/hal-00969191 ; International Conference on Language Resources and Evaluation (LREC), May 2014, Reykjavik, Iceland (2014)
BASE
Show details
4
Rhapsodie: a Prosodic-Syntactic Treebank for Spoken French
In: Language Resources and Evaluation Conference ; https://hal.sorbonne-universite.fr/hal-00968959 ; Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland (2014)
BASE
Show details
5
Correcting and Validating Syntactic Dependency in the Spoken French Treebank Rhapsodie
In: Proceedings of the 9th Language Resources and Evaluation Conference (LREC) ; https://halshs.archives-ouvertes.fr/halshs-01011059 ; Proceedings of the 9th Language Resources and Evaluation Conference (LREC), 2014, Iceland. pp.1-6 (2014)
BASE
Show details
6
Tamil Dependency Treebank v0.1
Ramasamy, Loganathan; Žabokrtský, Zdeněk. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2014
BASE
Show details
7
Copenhagen Dependency Treebanks versions 1-3
Buch-Kromann, Matthias. - : Copenhagen Business School, 2014
BASE
Show details
8
Czech-English Parallel Corpus 1.0 (CzEng 1.0)
Bojar, Ondřej; Žabokrtský, Zdeněk; Dušek, Ondřej. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2014
BASE
Show details
9
Prague Dependency Treebank 3.0
Bejček, Eduard; Hajičová, Eva; Hajič, Jan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2014
BASE
Show details
10
HamleDT 2.0
Zeman, Daniel; Mareček, David; Mašek, Jan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2014
BASE
Show details
11
Visual and Linguistic Treebank ...
Unkn Unknown. - : My University, 2014
BASE
Show details
12
Building Computational Resources : The URDU.KON-TB Treebank and the Urdu Parser
Abbas, Qaiser. - 2014
Abstract: This work presents the development of the URDU.KON-TB treebank, its annotation evaluation & guidelines and the construction of the Urdu parser for a South Asian language Urdu. Urdu is comparatively an under-resourced language and the development of a reliable treebank and a parser will have significant impact on the state-of-the-art for automatic Urdu language processing. The work includes the construction of the raw corpus containing 1400 sentences collected from Urdu Wikipedia and the Jang newspaper. The corpus contains text of local & international news, social stories, sports, culture, finance, religion, traveling, etc. The hierarchal annotation scheme adopted has a combination of phrase structure and hyper dependency structure. A semi-semantic part of speech tag set, a semi-semantic syntactic tag set and a functional tag set are proposed, which are further revised during the annotation of the raw corpus. The annotation of the sentences was performed manually. Due to the addition of morphology, part of speech, syntactical, semantical, clausal, grammatical and miscellaneous features, the annotation scheme is linguistically rich. The annotation resulted in a treebank for Urdu, called the URDU.KON-TB. This is presented in Chapter 3. For an evaluation of the annotation scheme, Krippendorff's Alpha coefficient is selected. This is a statistical measure to evaluate inter-annotator agreement. Randomly selected 100 sentences from the URDU.KON-TB treebank were given to five trained annotators for annotation. The annotated sentences then evaluated using the Krippendorff's Alpha coefficient. The alpha values of inter-annotator agreement obtained for part of speech, syntactical and functional annotation are 0.964, 0.817 and 0.806, respectively. The evaluation is presented in Chapter 4. All of the three values lie in the range of perfect agreement. The annotation guidelines devised in the development of the URDU.KON-TB treebank were revised during and after this annotation evaluation. The updated version is presented in Chapter 2. For the development of an Urdu parser, 1400 annotated sentences in the URDU.KON-TB treebank are divided into 80% training data and 20% test data. A context free grammar is extracted from this training data, which is then given to the Urdu parser after its development. The test data is divided into 10% held out data and 10% test data. The test data then contains 140 sentences with an average length of 13.73 words per sentence. The held out data is used during the development of the Urdu parser. Urdu parser is an extended version of dynamic programming algorithm known as the Earley parsing algorithm. The extensions made are discussed in Chapter 5 along with the issues faced during the development. All items which can occur in a normal text are considered, e.g., punctuation, null elements, diacritics, headings, regard titles, Hadees (the statements of prophets), anaphora with in a sentence, and others. The PARSEVAL measures are used to evaluate the results of the Urdu parser. By applying a sufficiently rich grammar along with the extended parsing model, the parser gives 87% of f-score and outperforms the multi-path-shift-reduce parser for Urdu, a two stage Hindi dependency parser and a simple Hindi dependency parser with 4.8%, 12.48% and 22% increase in recall, respectively. The URDU.KON-TB treebank and the Urdu parser is a contribution to the overall computational resources of Urdu. By products of this work are a semi-semantic part of speech tagset, a semi-semantic syntactic tagset, a functional tagset, annotation guidelines, a grammar with sufficient encoded information for parsing of morphologically rich language Urdu and a part of speech tagged corpus, which can be used for the training of part of speech taggers. These resources will be enhanced further and can be used for natural language processing such as probabilistic parsing, training of POS taggers, disambiguation of spoken sentences, grammar development, language identification, sources for linguistic inquiry and psychological modeling, or pattern matching.
Keyword: ddc:004; Functional Tagset; Semi-Semantic Part of Speech Tagset; Semi-Semantic Syntactic Tagset; Urdu Parser; Urdu Treebank Statistical Evaluation; URDU.KON.TB Treebank
URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-290530
BASE
Hide details
13
From Syntax to Semantics. First Steps Towards Tectogrammatical Annotation of Latin
Passarotti, Marco Carlo (orcid:0000-0002-9806-7187). - : The Association for Computational Linguistics, 2014. : country:SWE, 2014. : place:Gothenburg, 2014
BASE
Show details
14
Reflexões sobre anotação sintática e ferramentas de busca - Uso da linguagem XML para anotação sintática no corpus digital DOViC
In: Letras & Letras; v. 30, n. 2 (2014): Linguística de Corpus: abordagem e metodologia em pesquisas linguísticas de base empírica; 82-103 ; 1981-5239 (2014)
BASE
Show details
15
Challenges in Enhancing the Index Thomisticus Treebank with Semantic and Pragmatic Annotation
Gonzalez Saavedra, Berta; Passarotti, Marco Carlo (orcid:0000-0002-9806-7187). - : Department of Linguistics, University of Tübingen, 2014. : country:DEU, 2014. : place:Tubingen, 2014
BASE
Show details

Catalogues
Bibliographies
Linked Open Data catalogues
Online resources
Open access documents
15
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern