Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5...12

Hits 1 – 20 of 228

1	Measuring Terminology Consistency in Translated Corpora: Implementation of the Herfindahl-Hirshman Index
	Angelina Gašpar; Sanja Seljan; Vlasta Kučiš
	In: Information; Volume 13; Issue 2; Pages: 43 (2022)
	BASE
	Show details

2	Constructional equivalence in the Indonesian translations of ROB and STEAL ...
	Rajeg, Gede Primahadi Wijaya. - : Open Science Framework, 2021
	BASE
	Show details

3	paracorp ...
	Rajeg, Gede Primahadi Wijaya. - : Open Science Framework, 2021
	BASE
	Show details

4	FORMAL-FUNCTIONAL MODELS OF THE UZBEK ELECTRON CORPUS ...
	Abdurakhmonova, Nilufar. - : Zenodo, 2021
	BASE
	Show details

5	FORMAL-FUNCTIONAL MODELS OF THE UZBEK ELECTRON CORPUS ...
	Abdurakhmonova, Nilufar. - : Zenodo, 2021
	BASE
	Show details

6	Le particelle razve e neuželi alla luce del Corpus parallelo russo-italiano
	Noseda, Valentina (orcid:0000-0002-5148-1241). - : Aracne, 2021. : country:ITA, 2021. : place:Roma, 2021
	BASE
	Show details

7	Translationese and register variation in English-to-Russian professional translation
	Kunilovskaya, Maria; Corpas Pastor, Gloria
	In: New Perspectives on Corpus Translation Studies (2021)
	BASE
	Show details

8	EMPLOYING A PARALLEL CORPUS-BASED APPROACH IN TEACHING SEMANTIC PROSODY AND COLLOCATIONAL BEHAVIOR TO ARABIC EFL LEARNERS
	Alzahrani, Alhassan Abdullah J. - 2021
	BASE
	Show details

9	Alector: A Parallel Corpus of Simplified French Texts with Alignments of Misreadings by Poor and Dyslexic Readers
	Gala, Núria; Tack, Anaïs; Javourey-Drevet, Ludivine...
	In: Language Resources and Evaluation for Language Technologies (LREC) ; https://hal.archives-ouvertes.fr/hal-02503986 ; Language Resources and Evaluation for Language Technologies (LREC), May 2020, Marseille, France (2020)
	BASE
	Show details

10	MultiMWE: building a multi-lingual multi-word expression (MWE) parallel corpora
	Han, Lifeng; Jones, Gareth J.F.; Smeaton, Alan F.
	In: Han, Lifeng orcid:0000-0002-3221-2185 , Jones, Gareth J.F. orcid:0000-0003-2923-8365 and Smeaton, Alan F. orcid:0000-0003-1028-8389 (2020) MultiMWE: building a multi-lingual multi-word expression (MWE) parallel corpora. In: 12th International Conference on Language Resources and Evaluation (LREC), 11-16 May, 2020, Marseille, France. (Virtual). (2020)
	BASE
	Show details

11	Parallel data extraction using word embeddings
	Way, Andy; Lohar, Pintu
	In: Lohar, Pintu and Way, Andy orcid:0000-0001-5736-5930 (2020) Parallel data extraction using word embeddings. In: NLPTA 2020 : International Conference on NLP Techniques and Applications, 28-29 Nov 2020, London, UK (Online). (2020)
	BASE
	Show details

12	The Quest for 'Falsehood', or a Survey of Tools for the Study of Greek-Syriac-Arabic Translations ...
	Kessel, Grigory; Arnzen, Rüdiger; Čéplö, Slavomír. - : Zenodo, 2020
	BASE
	Show details

13	The Quest for 'Falsehood', or a Survey of Tools for the Study of Greek-Syriac-Arabic Translations ...
	Kessel, Grigory; Arnzen, Rüdiger; Čéplö, Slavomír. - : Zenodo, 2020
	BASE
	Show details

14	Seeking the unseen humanities macrostructures: The use of corpus- and genre-assisted research methodologies to analyze written norms in English and Spanish literary criticism articles
	Lake, William
	In: Applied Linguistics and English as a Second Language Dissertations (2020)
	BASE
	Show details

15	The use of parallel Corpora for a contrastive (Russian-Italian) description of discourse markers: new instruments compared to traditional lexicography
	Bonola, Anna Paola (orcid:0000-0003-3931-670X); Noseda, Valentina (orcid:0000-0002-5148-1241). - : Associazione per l’Informatica Umanistica e la Cultura Digitale, 2020. : country:ITA, 2020. : place:Milano, 2020
	BASE
	Show details

16	Building wordnets with multi-word expressions from parallel corpora ; Expansión de wordnets mediante unidades pluriverbales extraídas de corpus paralelos
	Simões, Alberto Manuel; Gómez Guinovart, Xavier. - : Sociedad Española para el Procesamiento del Lenguaje Natural, 2020
	BASE
	Show details

17	Automated creation of domain-specific bilingual corpora for machine translation, focusing on dissimilar language pairs
	Wloka, Bartholomäus. - 2020
	Abstract: Die Wichtigkeit satz-alignierter bilingualer Korpora, auch paralle Korpora genannt, als Trainingsdaten für maschinelle Übersetzungsysteme und für eine Vielzahl anderer Sprachtechnologieanwendungen ist in den letzten Jahren im- mer deutlicher geworden. Sogar noch mehr gefragt sind Korpora, die eine bes- timmte Domäne abdecken und somit noch zielgerichteter für das Training von Deep Learning, statistischen oder beispielbasierten Systemen sind. Das Ziel dieser Doktorarbeit ist es, die Realisierbarkeit der automatisierten Erstel- lung von parallelen Daten aus Wikipedia zu untersuchen. Insbesondere wer- den Sprachpaare untersucht, die in Hinblick auf Oberflächenstruktur und andere Aspekte sehr unterschiedlich sind. Genauer gesagt, wie kann domä- nenspezifischer Text aus Wikipedia effizient gesammelt werden, wie können diese Daten auf Satzebene aligniert werden und wie können diese Satzpaare evaluiert werden, um die bestmöglichen Übersetzungskandidaten zu bekom- men. Die Forschungsfragen sind: Wie viel des Wikipedia-Inhaltes kann verwen- det werden, um bilinguale Korpora für ein bestimmtes Sprachpaar zu bauen und wie können diese Texte effizient aligniert werden; all das mit minimalem menschlichem Input. Für die Beantwortung dieser Frage wurden zwei Sprachen gewählt, die repräsentativ für die Fragestellung sind, nämlich Englisch und Japanisch. Der Ablauf, die Algorithmen, die Softwaremodule und das daraus resultierende Korpus sind als Proof of Concept zu verstehen und können an andere Domä- nen und Sprachpaare angepasst werden. Diese Arbeit schlägt eine Methode für themenspezifisches Datensammeln aus Wikipedia, eine Alignierungsmethode und eine Qualitätsmetrik vor. Die Algorithmen der in dem Zusammenhang entstandenen Software sind sowohl generisch beschrieben, wie auch in Python implementiert. Das Ergebnis einer Iteration der Software, 66,000 Satzpaare, ist der erste experimentelle Daten- satz. Dieser Datensatz wird von Experten evaluiert, um die Ergiebigkeit, Um- setzbarkeit und Effizienz dieser Methode zu untersuchen. ; The significance of sentence-aligned bilingual corpora, so-called parallel corpora, as training sets for machine translation systems and for various other language technology applications has become more and more evident in re- cent years. Even more desirable are collections which address a certain domain and hence offer more precise data for training of deep learning, statistical, or example-based approaches. The goal of this doctoral dissertation is to exam- ine the feasibility of automated bilingual corpus creation from Wikipedia, specifically for languages which differ significantly in surface characteristics and other aspects. More precisely, how can Wikipedia be crawled to obtain domain-specific corpora in an efficient way, how can these corpora be sentence- aligned, and how can these alignments be evaluated to obtain the highest pos- sible probability of a translated or equivalent sentence. The research questions addressed in this work are: How much of the text on Wikipedia content can be used to build a bilingual aligned corpus for a spe- cific language pair, and how can these texts be selected and aligned efficiently, all with minimal human input in the process. The question is addressed by selecting two languages, which are represen- tative of a dissimilar pair, English and Japanese. The resulting procedure, al- gorithms, software modules, and created corpus are a proof of concept, which can be adjusted in order to be applied to other dissimilar language pairs. This dissertation proposes a method for crawling from Wikipedia by topic, aligning this data into a parallel corpus and a novel metric that measures the relative quality of this alignment. The resulting program tool chain is pre- sented as a generic algorithm and is implemented in the Python programming language. The result of a first iteration of the software resulted in an English- Japanese parallel corpus of 66,000 sentence pairs. Human expert evaluations are presented to show the yield, feasibility, and efficiency of this method.
	Keyword: 17.98 Textsammlungen; 54.59 Programmierung: Sonstiges; 54.72 Künstliche Intelligenz; 54.89 Angewandte Informatik: Sonstiges; parallel Corpora / machine translation / natural language processing / web-crawling / software-application; Parallele Korpora / maschinelle Übersetzung / Natural Language Processing / Web-craling / Software-Applikation
	URL: http://othes.univie.ac.at/65012/
	BASE
	Hide details

18	The use of English, Czech and French punctuation marks in reference, parallel and comparable web corpora: a question of methodology
	Olga Nádvorníková
	In: Linguistica Pragensia, Vol 30, Iss 1, Pp 30-50 (2020) (2020)
	BASE
	Show details

19	Automatic identification methods on a corpus of twenty five fine-grained Arabic dialects
	Harrat, Salima; Meftouh, Karima; Abidi, Karima...
	In: Arabic Language Processing: From Theory to Practice7th International Conference, ICALP 2019, Nancy, France, October 16–17, 2019, Proceedings ; https://hal.archives-ouvertes.fr/hal-02314245 ; Arabic Language Processing: From Theory to Practice 7th International Conference, ICALP 2019, Nancy, France, October 16–17, 2019, Proceedings, Communications in Computer and Information Science book series (CCIS, volume 1108), 2019, ⟨10.1007/978-3-030-32959-4_6⟩ (2019)
	BASE
	Show details

20	Machine Translation on a parallel Code-Switched Corpus
	Menacer, Mohamed; Langlois, David; Jouvet, Denis...
	In: Canadian AI 2019 - 32nd Conference on Canadian Artificial Intelligence ; https://hal.archives-ouvertes.fr/hal-02106010 ; Canadian AI 2019 - 32nd Conference on Canadian Artificial Intelligence, May 2019, Ontario, Canada (2019)
	BASE
	Show details

Page: 1 2 3 4 5...12

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern