DE eng

Search in the Catalogues and Directories

Hits 1 – 15 of 15

1
MarsaTag, a tagger for French written texts and speech transcriptions
In: Second Asian Pacific Corpus linguistics Conference ; https://hal.archives-ouvertes.fr/hal-01500736 ; Second Asian Pacific Corpus linguistics Conference, Mar 2014, Hong Kong, China. pp.220-220 (2014)
Abstract: International audience ; We present in this paper a new system, MarsaTag, aiming at segmenting, tagging and chunking French input. The originality of the tool, on top of its efficiency, is its ability to process written texts as well as speech transcriptions. The tagger executes the three following operations. First, a rule-based tokenizer splits the raw textual input in a sequence of tokens. In a second step, thanks to a broad-coverage morphosyntactic lexicon, each token form is associated to a tag distribution. The last step consists in disambiguating the tagging by selecting the POS tag sequence with the highest probability. The probability of a sequence of tags is computed thanks to a stochastic model using the Hidden Markov Model machinery. The states or patterns of our model are extracted from the GraceLPL resource (700,000 tokens with morphosyntactic annotation). The performance of the tagger reaches an F-measure score of 0.974 for written material. The tagger has been adapted for the treatment of spontaneous speech transcriptions. The system has been trained with a large spoken French corpus (CID, see Bertrand et al. 2008). Phenomena proper to speech (filled paused, disfluencies, truncation, etc.) were identified and included in a model specific to speech transcription inputs. The tagger performance of 0.948 (F-measure) has been evaluated on the manual corrected tags of the CID corpus. MarsaTag is distributed with a software interface allowing the choice of various input and output formats (see hdl:11041/sldr000841). Thanks to the genericity of the technique, extension to other languages for which annotated treebanks are available (e.g. Chinese Penn Treebank) is currently in progress.
Keyword: [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; [SHS.LANGUE]Humanities and Social Sciences/Linguistics; resource; syntax; tagging; treebank
URL: https://hal.archives-ouvertes.fr/hal-01500736
BASE
Hide details
2
Phrase extraction and rescoring in statistical machine translation
Srivastava, Ankit Kumar. - : Dublin City University. Centre for Next Generation Localisation (CNGL), 2014. : Dublin City University. School of Computing, 2014
In: Srivastava, Ankit Kumar (2014) Phrase extraction and rescoring in statistical machine translation. PhD thesis, Dublin City University. (2014)
BASE
Show details
3
Deep Syntax Annotation of the Sequoia French Treebank
In: International Conference on Language Resources and Evaluation (LREC) ; https://hal.inria.fr/hal-00969191 ; International Conference on Language Resources and Evaluation (LREC), May 2014, Reykjavik, Iceland (2014)
BASE
Show details
4
Rhapsodie: a Prosodic-Syntactic Treebank for Spoken French
In: Language Resources and Evaluation Conference ; https://hal.sorbonne-universite.fr/hal-00968959 ; Language Resources and Evaluation Conference, May 2014, Reykjavik, Iceland (2014)
BASE
Show details
5
Correcting and Validating Syntactic Dependency in the Spoken French Treebank Rhapsodie
In: Proceedings of the 9th Language Resources and Evaluation Conference (LREC) ; https://halshs.archives-ouvertes.fr/halshs-01011059 ; Proceedings of the 9th Language Resources and Evaluation Conference (LREC), 2014, Iceland. pp.1-6 (2014)
BASE
Show details
6
Tamil Dependency Treebank v0.1
Ramasamy, Loganathan; Žabokrtský, Zdeněk. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2014
BASE
Show details
7
Copenhagen Dependency Treebanks versions 1-3
Buch-Kromann, Matthias. - : Copenhagen Business School, 2014
BASE
Show details
8
Czech-English Parallel Corpus 1.0 (CzEng 1.0)
Bojar, Ondřej; Žabokrtský, Zdeněk; Dušek, Ondřej. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2014
BASE
Show details
9
Prague Dependency Treebank 3.0
Bejček, Eduard; Hajičová, Eva; Hajič, Jan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2014
BASE
Show details
10
HamleDT 2.0
Zeman, Daniel; Mareček, David; Mašek, Jan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2014
BASE
Show details
11
Visual and Linguistic Treebank ...
Unkn Unknown. - : My University, 2014
BASE
Show details
12
Building Computational Resources : The URDU.KON-TB Treebank and the Urdu Parser
Abbas, Qaiser. - 2014
BASE
Show details
13
From Syntax to Semantics. First Steps Towards Tectogrammatical Annotation of Latin
Passarotti, Marco Carlo (orcid:0000-0002-9806-7187). - : The Association for Computational Linguistics, 2014. : country:SWE, 2014. : place:Gothenburg, 2014
BASE
Show details
14
Reflexões sobre anotação sintática e ferramentas de busca - Uso da linguagem XML para anotação sintática no corpus digital DOViC
In: Letras & Letras; v. 30, n. 2 (2014): Linguística de Corpus: abordagem e metodologia em pesquisas linguísticas de base empírica; 82-103 ; 1981-5239 (2014)
BASE
Show details
15
Challenges in Enhancing the Index Thomisticus Treebank with Semantic and Pragmatic Annotation
Gonzalez Saavedra, Berta; Passarotti, Marco Carlo (orcid:0000-0002-9806-7187). - : Department of Linguistics, University of Tübingen, 2014. : country:DEU, 2014. : place:Tubingen, 2014
BASE
Show details

Catalogues
Bibliographies
Linked Open Data catalogues
Online resources
Open access documents
15
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern