Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 12 of 12

1	Political analytics on election candidates and their parties in context of the US Presidential elections 2020
	Sorathiya, Rakshit. - : Laurentian University of Sudbury, 2021
	BASE
	Show details

2	Twitter sentiment analysis of the 2019 Indian election
	Motisariya, Jaydeep. - : Laurentian University of Sudbury, 2020
	BASE
	Show details

3	MiNgMatch—A Fast N-gram Model for Word Segmentation of the Ainu Language
	Karol Nowakowski; Michal Ptaszynski; and Fumito Masui
	In: Information ; Volume 10 ; Issue 10 (2019)
	Abstract: Word segmentation is an essential task in automatic language processing for languages where there are no explicit word boundary markers, or where space-delimited orthographic words are too coarse-grained. In this paper we introduce the MiNgMatch Segmenter&mdash ; a fast word segmentation algorithm, which reduces the problem of identifying word boundaries to finding the shortest sequence of lexical n-grams matching the input text. In order to validate our method in a low-resource scenario involving extremely sparse data, we tested it with a small corpus of text in the critically endangered language of the Ainu people living in northern parts of Japan. Furthermore, we performed a series of experiments comparing our algorithm with systems utilizing state-of-the-art lexical n-gram-based language modelling techniques (namely, Stupid Backoff model and a model with modified Kneser-Ney smoothing), as well as a neural model performing word segmentation as character sequence labelling. The experimental results we obtained demonstrate the high performance of our algorithm, comparable with the other best-performing models. Given its low computational cost and competitive results, we believe that the proposed approach could be extended to other languages, and possibly also to other Natural Language Processing tasks, such as speech recognition.
	Keyword: Ainu language; endangered languages; language modelling; n-gram models; tokenization; under-resourced languages; word segmentation
	URL: https://doi.org/10.3390/info10100317
	BASE
	Hide details

4	Multilingual Word Segmentation: Training Many Language-Specific Tokenizers Smoothly Thanks to the Universal Dependencies Corpus
	Moreau, Erwan; Vogel, Carl
	In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) ; https://hal.archives-ouvertes.fr/hal-01822151 ; Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, Miyazaki, Japan (2018)
	BASE
	Show details

5	CoNLL 2017 and 2018 Shared Task Blind and Preprocessed Test Data
	Zeman, Daniel; Straka, Milan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2018
	BASE
	Show details

6	Sentiment analysis on Twitter data using machine learning
	Patel, Ravikumar. - : Laurentian University of Sudbury, 2017
	BASE
	Show details

7	Multi-word tokenization for natural language processing ... : Mehrworttokenisierung für maschinelle Sprachverarbeitung ...
	Michelbacher, Lukas. - : Universität Stuttgart, 2013
	BASE
	Show details

8	Multi-word tokenization for natural language processing ; Mehrworttokenisierung für maschinelle Sprachverarbeitung
	Michelbacher, Lukas. - 2013
	BASE
	Show details

9	JADT 2010: 10 th International Conference on Statistical Analysis of Textual Data Unsupervised learning of word separators with MDL
	Aris Xanthos; François Bavaud
	In: http://lexicometrica.univ-paris3.fr/jadt/jadt2010/allegati/JADT-2010-1123-1134_007-Xanthos.pdf
	BASE
	Show details

10	An Enhancement of Thai Text Retrieval Efficiency by Automatic Backward Transliteration
	Navapat Khantonthong; Asanee Kawtrakul; Yuen Poovarawan
	In: http://naist.cpe.ku.ac.th/downloads/publications/2000/An_Enhancement_of_Thai_Text_Retrieval_Efficiency_by_Automatic_Backward_Transliteration.pdf
	BASE
	Show details

11	A Word-Finding Automaton for Chinese Sentence Tokenization
	Hong-I Ng; Kim-teng Lua
	In: http://cslp.comp.nus.edu.sg/luakt/paper/SUB01.ps
	BASE
	Show details

12	An Empirical Study of Tokenization Strategies for Biomedical Information Retrieval
	Jing Jiang; Chengxiang Zhai
	In: http://sifaka.cs.uiuc.edu/czhai/pub/ir-tok.pdf
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern