3 |
WALS Online Resources for Aynu
|
|
: Max Planck Institute for Evolutionary Anthropology, 2021
|
|
BASE
|
|
Show details
|
|
4 |
WALS Online Resources for Ainu
|
|
: Max Planck Institute for Evolutionary Anthropology, 2021
|
|
BASE
|
|
Show details
|
|
5 |
Glottolog 4.4 Resources for Hokkaido Ainu
|
|
: Max Planck Institute for Evolutionary Anthropology, 2021
|
|
BASE
|
|
Show details
|
|
6 |
Glottolog 4.4 Resources for Ainu (China)
|
|
: Max Planck Institute for Evolutionary Anthropology, 2021
|
|
BASE
|
|
Show details
|
|
8 |
PHOIBLE 2.0 phonemic inventories for Hokkaido Ainu
|
|
: Max Planck Institute for the Science of Human History, 2019
|
|
BASE
|
|
Show details
|
|
9 |
MiNgMatch—A Fast N-gram Model for Word Segmentation of the Ainu Language
|
|
|
|
In: Information ; Volume 10 ; Issue 10 (2019)
|
|
Abstract:
Word segmentation is an essential task in automatic language processing for languages where there are no explicit word boundary markers, or where space-delimited orthographic words are too coarse-grained. In this paper we introduce the MiNgMatch Segmenter&mdash ; a fast word segmentation algorithm, which reduces the problem of identifying word boundaries to finding the shortest sequence of lexical n-grams matching the input text. In order to validate our method in a low-resource scenario involving extremely sparse data, we tested it with a small corpus of text in the critically endangered language of the Ainu people living in northern parts of Japan. Furthermore, we performed a series of experiments comparing our algorithm with systems utilizing state-of-the-art lexical n-gram-based language modelling techniques (namely, Stupid Backoff model and a model with modified Kneser-Ney smoothing), as well as a neural model performing word segmentation as character sequence labelling. The experimental results we obtained demonstrate the high performance of our algorithm, comparable with the other best-performing models. Given its low computational cost and competitive results, we believe that the proposed approach could be extended to other languages, and possibly also to other Natural Language Processing tasks, such as speech recognition.
|
|
Keyword:
Ainu language; endangered languages; language modelling; n-gram models; tokenization; under-resourced languages; word segmentation
|
|
URL: https://doi.org/10.3390/info10100317
|
|
BASE
|
|
Hide details
|
|
10 |
Improving Basic Natural Language Processing Tools for the Ainu Language
|
|
|
|
In: Information ; Volume 10 ; Issue 11 (2019)
|
|
BASE
|
|
Show details
|
|
15 |
Four Poems from To Young Utari by Yaeko Batchelor
|
|
|
|
In: Transference (2018)
|
|
BASE
|
|
Show details
|
|
18 |
Adnominal clauses and the 'Mermaid construction' : grammaticalization of nouns
|
|
Tsunoda, Tasaku. - [Tokyo] : Natl. Inst. for Japanese Language and Linguistics, 2013
|
|
MPI-SHH Linguistik
|
|
Show details
|
|
19 |
Sustaining indigenous knowledge : learning tools and community initiatives for preserving endangered languages and local cultural heritage
|
|
Kasten, Erich (Hrsg.). - [Fürstenberg] : Kulturstiftung Sibirien, SEC Publ., 2013
|
|
BLLDB
|
|
UB Frankfurt Linguistik
|
|
Show details
|
|
20 |
WOLD Resources for Ainu
|
|
: Max Planck Institute for Evolutionary Anthropology, 2013
|
|
BASE
|
|
Show details
|
|
|
|