Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 14 of 14

1	One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia ...
	Aji, Alham Fikri; Winata, Genta Indra; Koto, Fajri. - : arXiv, 2022
	BASE
	Show details

2	Evaluating the Efficacy of Summarization Evaluation across Languages ...
	Koto, Fajri; Lau, Jey Han; Baldwin, Timothy. - : arXiv, 2021
	BASE
	Show details

3	Rumour Detection via Zero-shot Cross-lingual Transfer Learning ...
	Tian, Lin; Zhang, Xiuzhen; Lau, Jey Han. - : arXiv, 2021
	BASE
	Show details

4	IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; Baldwin, Timothy; Koto, Fajri. - : Underline Science Inc., 2021
	BASE
	Show details

5	Evaluating the Efficacy of Summarization Evaluation across Languages ...
	The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 2021; Baldwin, Timothy; Koto, Fajri. - : Underline Science Inc., 2021
	BASE
	Show details

6	Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; ., Yuji; Baldwin, Timothy. - : Underline Science Inc., 2021
	BASE
	Show details

7	Liputan6: A Large-scale Indonesian Dataset for Text Summarization ...
	Koto, Fajri; Lau, Jey Han; Baldwin, Timothy. - : arXiv, 2020
	BASE
	Show details

8	How Furiously Can Colourless Green Ideas Sleep? Sentence Acceptability in Context ...
	Lau, Jey Han; Armendariz, Carlos S.; Lappin, Shalom. - : arXiv, 2020
	BASE
	Show details

9	Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora ...
	Wada, Takashi; Iwata, Tomoharu; Matsumoto, Yuji. - : arXiv, 2020
	BASE
	Show details

10	IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP ...
	The 28th International Conference on Computational Linguistics 2020; Baldwin, Timothy; Koto, Fajri. - : Underline Science Inc., 2020
	BASE
	Show details

11	IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP ...
	Koto, Fajri; Rahimi, Afshin; Lau, Jey Han. - : arXiv, 2020
	BASE
	Show details

12	How Furiously Can Colorless Green Ideas Sleep? Sentence Acceptability in Context
	Lau, Jey Han; Armendariz, Carlos; Lappin, Shalom...
	In: Transactions of the Association for Computational Linguistics, Vol 8, Pp 296-310 (2020) (2020)
	BASE
	Show details

13	Topically Driven Neural Language Model ...
	Lau, Jey Han; Baldwin, Timothy; Cohn, Trevor. - : arXiv, 2017
	BASE
	Show details

14	Improving the utility of topic models: an uncut gem does not sparkle
	LAU, JEY HAN. - 2013
	Abstract: © 2013 Dr. Jey Han Lau ; This thesis concerns a type of statistical model known as topic model. Topic modelling learns abstract “topics” in a collection of documents, and by “topic” we mean an idea, theme or subject. For example we may have an article that discusses space exploration, or a book about crime. Space exploration and crime, these two subjects, are the “topics” that we are talking about. As one imagine, topic modelling has a direct application in digital libraries, as it automates the learning and categorisation of topics in books and articles. The merit of topic modelling, however, is that its machinery is not limited to processing just words but symbols in general. As such, topic modelling has seen applications in other areas outside text processing such as biomedical research for inferring protein families. Most applications, however, are small scale and experimental and much of the impact is still contained in academic research. The overarching theme of the thesis is thus to improve the utility of topic modelling. We achieve this in two ways: (1) by improving a few aspects of topic modelling to make it more accessible and usable by users; and (2) by proposing novel applications of topic modelling to real-world problems. In the first step, we look into improving the preprocessing methodology of documents that serves as the creation of input for topic models. We also experiment extensively to improve the visualisation of topics—one of the main output of topic models—to increase its usability for human users. In the second step, we apply topic modelling in a lexicography-oriented work to learn and detect new meanings that have emerged in words and in the social media space to identify popular social trends. Both were novel applications and delivered promising results, demonstrating the strength and wide applicability of topic models.
	Keyword: graphical models; multiword expressions; natural language processing; topic labelling; topic models; word sense induction
	URL: http://hdl.handle.net/11343/38159
	BASE
	Hide details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern