Catalogue search • Linguistik portal • Fachinformationsdienst (FID)

1	Automatic methods to extract latent meanings in large text corpora ...
	Pölitz, Christian. - : Technische Universität Dortmund, 2016
	Abstract: This thesis concentrates on Data Mining in Corpus Linguistic. We show the use of modern Data Mining by developing efficient and effective methods for research and teaching in Corpus Linguistics in the fields of lexicography and semantics. Modern language resources as they are provided by Common Language Resources and Technology Infrastructure (http://clarin.eu) offer a large number of heterogeneous information resources of written language. Besides large text corpora, additional information about the sources or publication date of the documents from the corpora are available. Further, information about words from dictionaries or WordNets offer prior information of the word distributions. Starting with pre-studies in lexicography and semantics with large text corpora, we investigate the use of latent variable methods to extract hidden concepts in large text collections. We show that these hidden concepts correspond to meanings of words and subjects in text collections. This motivates an investigation of ...
	Keyword: 004; Automatische Sprachanalyse; Computerunterstützte Lexikografie; Corpus linguistics; Data Mining; Korpuslinguistik; Natural language processing; RapidMiner; WordNet
	URL: https://dx.doi.org/10.17877/de290r-17781 https://eldorado.tu-dortmund.de/handle/2003/35753
	BASE
	Hide details

2	Automatic methods to extract latent meanings in large text corpora
	Pölitz, Christian. - 2016
	BASE
	Show details

Search in the Catalogues and Directories