Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3

Hits 21 – 40 of 41

21	Pluricentric languages : automatic identification and linguistic variation ; Plurizentrische Sprachen : automatische Spracherkennung und linguistische Variation
	Zampieri, Marcos. - 2016
	Abstract: Language Identification is a well-known research topic in NLP. State-of-the-art methods consist of the application of n-gram language models to distinguish languages automatically with well over 95% accuracy. This level of success is obtained when discriminating between languages that are typologically not closely related (e.g. Finnish and Spanish), or due to the contrast between languages with unique character sets such as Greek or Hebrew. Recent studies show that one of the main difficulties of n-gram based methods is the identification of closely related languages. The research presented in this thesis goes one step further and investigates computational methods to identify standard national varieties of pluricentric languages such as Portuguese, Spanish, French, and English. It explores different computational methods and different sets of features for this task that go beyond character and word language models. The main objective is to investigate the extent to which it is possible to identify language varieties automatically in both monolingual and in real-world (multilingual) settings and to establish what are the main challenges of this task in comparison to general purpose language identification models. This research shows, for example, that it is possible to discriminate between Brazilian and European Portuguese with 99.8% accuracy using journalistic texts. Another contribution of this thesis is the use of linguistically motivated features such as POS tags and morphological information to discriminate between language varieties with results of up to 83.1% accuracy in discriminating between Mexican and Peninsular Spanish texts. An additional aspect of this thesis is the use of classification output in corpus-driven contrastive linguistics research as explained in Chapter 6. Classification methods combined with linguistically meaningful features are able to provide empirical evidence on the convergences and divergences of language varieties in terms of lexicon, orthography, morphology and syntax. ; Die Sprachidentifikation ist ein wichtiges Forschungsthema in der Computerlinguistik. Aktuelle Verfahren nutzen n-gram-Sprachmodelle, um Sprachen automatisch voneinander zu unterscheiden, und erzielen dabei Genauigkeiten von über 95%. Entsprechende Leistungen werden dabei insbesondere dann erzielt, wenn die Algorithmen Sprachen, die typologisch nicht eng miteinander verwandt sind (z.B. Finnisch und Spanisch), klassifizieren oder aber auf Sprachen mit eindeutigen Zeichensätzen wie Griechisch oder Hebräisch. Studien zeigen jedoch, dass eine der Hauptschwierigkeiten n-gram-basierter Verfahren in der Identifikation ähnlicher Sprachen besteht. Die vorliegende Arbeit geht daher einen Schritt über existierende Methoden hinaus und untersucht Identifikationsverfahren für plurizentrische Sprachen wie das Portugiesische, Spanische, Französische und Englische. Dafür werden Algorithmen und Merkmale verwendet, die reichere Mengen linguistischer Information kodieren als zeichen- oder wortbasierte Sprachmodelle. Das Hauptziel der Arbeit besteht dabei darin zu untersuchen, inwieweit es möglich ist, Sprachvarietäten sowohl in einsprachigen als auch in mehrsprachigen Settings automatisch zu identifizieren. Auf Grundlage dieser Experimente ist es darüber hinaus müglich zu bewerten, welche die wesentlichen Schwierigkeiten des beschriebenen Ansatzes im Vergleich zu generischen Modelle sind. Ein Nebenaspekt dieser Arbeit ist zudem die Verwendung des Klassifikationsoutputs in der korpus-basierten kontrastiven Linguistik, denn Klassifikationsverfahren auf Grundlage interpretierbarer sprachlicher Merkmale sind in der Lage, empirische Erkenntnisse über die Konvergenzen und Divergenzen dieser Sprachvarietäten in Bezug auf Lexikon, Rechtschreibung, Morphologie und Syntax zu liefern.
	Keyword: computational linguistics; Computerlinguistik; ddc:400; Korpus; language identification; language varieties; Linguistik; natural language processing; Sprachvariante
	URL: https://doi.org/10.22028/D291-23660 http://nbn-resolving.org/urn:nbn:de:bsz:291-scidok-66749
	BASE
	Hide details

22	An Information theoretic approach to production and comprehension of discourse markers ...
	Torabi Asr, Fatemeh. - : Universität des Saarlandes, 2015
	BASE
	Show details

23	Digital humanities: centres and peripheries
	Schreibman, Susan
	In: Historical Social Research ; 37 ; 3 ; 46-58 ; Kontroversen um die Digitalen Geisteswissenschaften / Controversies around the digital humanities (2015)
	BASE
	Show details

24	Identifying events using computer-assisted text analysis
	Landmann, Juliane; Züll, Cornelia
	In: Social Science Computer Review ; 26 ; 4 ; 483-497 (2015)
	BASE
	Show details

25	Controversies around the digital humanities: an agenda
	Thaller, Manfred
	In: Historical Social Research ; 37 ; 3 ; 7-23 ; Kontroversen um die Digitalen Geisteswissenschaften / Controversies around the digital humanities (2015)
	BASE
	Show details

26	An Information theoretic approach to production and comprehension of discourse markers
	Torabi Asr, Fatemeh. - 2015
	BASE
	Show details

27	Hypertextuality, complexity, creativity: using linguistic software tools to uncover new information about the food and drink of historic Mayans
	Lema, Rose
	In: Forum Qualitative Sozialforschung / Forum: Qualitative Social Research ; 13 ; 2 ; 33 ; Rechnergestützte Datenanalyse: verschiedene Kontexte, verschiedene Praktiken / Qualitative computing: diverse worlds and research practices (2013)
	BASE
	Show details

28	The journal project: qualitative computing and the technology/ aesthetics divide in qualitative research
	Davidson, Judith
	In: Forum Qualitative Sozialforschung / Forum: Qualitative Social Research ; 13 ; 2 ; 30 ; Rechnergestützte Datenanalyse: verschiedene Kontexte, verschiedene Praktiken / Qualitative computing: diverse worlds and research practices (2013)
	BASE
	Show details

29	Computer simulation experiments in phonetics and phonology : simulation technology in linguistic research on human speech ; Computersimulationsexperimente in Phonetik und Phonologie
	Duran, Daniel. - 2013
	BASE
	Show details

30	Word meaning in context : a probabilistic model and its application to question answering ...
	Dinu, Georgiana. - : Universität des Saarlandes, 2011
	BASE
	Show details

31	Hybrid approaches for sentiment analysis ... : Hybridansätze für die Sentimentanalyse ...
	Wiegand, Michael. - : Universität des Saarlandes, 2011
	BASE
	Show details

32	Hybrid approaches for sentiment analysis ; Hybridansätze für die Sentimentanalyse
	Wiegand, Michael. - 2011
	BASE
	Show details

33	Word meaning in context : a probabilistic model and its application to question answering
	Dinu, Georgiana. - 2011
	BASE
	Show details

34	Graph-based methods for large-scale multilingual knowledge integration ... : Graphenbasierte Methoden zur multilingualen Wissensintegration ...
	Melo, Gerard De. - : Universität des Saarlandes, 2010
	BASE
	Show details

35	Korpora ; Text corpora
	Zinsmeister, Heike. - 2010
	BASE
	Show details

36	Graph-based methods for large-scale multilingual knowledge integration ; Graphenbasierte Methoden zur multilingualen Wissensintegration
	Melo, Gerard de. - 2010
	BASE
	Show details

37	German clause-embedding predicates : an extraction and classification approach ; Deutsche Prädikate mit Nebensätzen : ihre Extraktion und Klassifikation
	Lapshinova-Koltunski, Ekaterina. - 2010
	BASE
	Show details

38	Annotating Discourse Anaphora
	Dipper, Stefanie; Zinsmeister, Heike
	In: Proceedings of the Workshop " Third Linguistic Annotation Workshop ", LAW III, ACL-IJCNLP 2009, Suntec, Singapore, 6 - 7 August 2009. - S. 166-169 (2009)
	BASE
	Show details

39	The Role of the German Vorfeld for Local Coherence : a pilot study
	Dipper, Stefanie; Zinsmeister, Heike
	In: Von der Form zur Bedeutung : Texte automatisch verarbeiten, Proceedings of the biennial GSCL conference 2009 / Chiarcos, Christian et al. (Hrsg.). - Tübingen : Narr, 2009. - S. 69-80. - ISBN 978-3-8233-6511-2 (2009)
	BASE
	Show details

40	Parameterized type expansion in the feature structure formalism TDL ...
	Schäfer, Ulrich. - : Universität des Saarlandes, 1995
	BASE
	Show details

Page: 1 2 3

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern