DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 30

1
Semi-supervised Contextual Historical Text Normalization
In: Makarov, Peter; Clematide, Simon (2020). Semi-supervised Contextual Historical Text Normalization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 1 July 2020, Association for Computational Linguistics. (2020)
Abstract: Historical text normalization, the task of mapping historical word forms to their modern counterparts, has recently attracted a lot of interest (Bollmann, 2019; Tang et al., 2018; Lusetti et al., 2018; Bollmann et al., 2018;Robertson and Goldwater, 2018; Bollmannet al., 2017; Korchagina, 2017). Yet, virtually all approaches suffer from the two limitations: 1) They consider a fully supervised setup, often with impractically large manually normalized datasets; 2) Normalization happens on words in isolation. By utilizing a simple generative normalization model and obtaining powerful contextualization from the target-side language model, we train accurate models with unlabeled historical data. In realistic training scenarios, our approach often leads to reduction in manually normalized data at the same accuracy levels.
Keyword: 000 Computer science; 410 Linguistics; Institute of Computational Linguistics; knowledge & systems
URL: https://doi.org/10.18653/v1/2020.acl-main.650
https://www.zora.uzh.ch/id/eprint/198772/
https://doi.org/10.5167/uzh-198772
https://www.zora.uzh.ch/id/eprint/198772/1/2020.acl-main.650.pdf
BASE
Hide details
2
Proceedings of the LREC 2020: 8th Workshop on Challenges in the Management of Large Corpora (CMLC-8)
In: Proceedings of the LREC 2020: 8th Workshop on Challenges in the Management of Large Corpora (CMLC-8). Edited by: Bański, Piotr; Barbaresi, Adrien; Clematide, Simon; Kupietz, Marc; Lüngen, Harald; Pisetta, Ines (2020). Marseille, France: European Language Ressources Association. (2020)
BASE
Show details
3
Modelling Large Parallel Corpora: The Zurich Parallel Corpus Collection
In: Graën, Johannes; Kew, Tannon; Shaitarova, Anastassia; Volk, Martin (2019). Modelling Large Parallel Corpora: The Zurich Parallel Corpus Collection. In: Challenges in the Management of Large Corpora (CMLC-7), Cardiff, Wales, 22 July 2019 - 22 July 2019. (2019)
BASE
Show details
4
Improving OCR of Black Letter in Historical Newspapers: The Unreasonable Effectiveness of HTR Models on Low-Resolution Images
In: Ströbel, Phillip; Clematide, Simon (2019). Improving OCR of Black Letter in Historical Newspapers: The Unreasonable Effectiveness of HTR Models on Low-Resolution Images. Utrecht: Digital Humanities 2019. (2019)
BASE
Show details
5
Crowdsourcing the OCR Ground Truth of a German and French Cultural Heritage Corpus
In: Clematide, Simon; Furrer, Lenz; Volk, Martin (2018). Crowdsourcing the OCR Ground Truth of a German and French Cultural Heritage Corpus. Journal for Language Technology and Computational Linguistics (JLCL), 33(1):25-47. (2018)
BASE
Show details
6
Supervised OCR Error Detection and Correction Using Statistical and Neural Machine Translation Methods
In: Amrhein, Chantal; Clematide, Simon (2018). Supervised OCR Error Detection and Correction Using Statistical and Neural Machine Translation Methods. Journal for Language Technology and Computational Linguistics (JLCL), 33(1):49-76. (2018)
BASE
Show details
7
Improving OCR quality of Historical Newspapers with Handwritten Text Recognition Models
In: Clematide, Simon; Ströbel, Phillip (2018). Improving OCR quality of Historical Newspapers with Handwritten Text Recognition Models. In: Workshop DARIAH-CH, Neuchâtel, 29 November 2018 - 30 November 2018. (2018)
BASE
Show details
8
A Simple and Effective biLSTM Approach to Aspect-Based Sentiment Analysis in Social Media Customer Feedback
In: Clematide, Simon (2018). A Simple and Effective biLSTM Approach to Aspect-Based Sentiment Analysis in Social Media Customer Feedback. In: Barbaresi, Adrien; Biber, Hanno; Neubarth, Friedrich; Osswald, Rainer. 14th Conference on Natural Language Processing - KONVENS 2018. Vienna: Verlag der Österreichischen Akademie der Wissenschaften, 29-33. (2018)
BASE
Show details
9
Challenges in the Management of Large Corpora (CMLC-6)
In: Challenges in the Management of Large Corpora (CMLC-6). Edited by: Banski, Piotr; Kupietz, Marc; Barbaresi, Adrien; Biber, Hanno; Breiteneder, Evelyn; Clematide, Simon; Witt, Andreas (2018). Paris: European Language Resources Association (ELRA). (2018)
BASE
Show details
10
Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanitie
In: Clematide, Simon; Meraner, Isabel; Bubenhofer, Noah; Volk, Martin (2017). Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanitie. In: Teaching NLP for Digital Humanitie, Berlin, 12 September 2017 - 12 September 2017, 17-22. (2017)
BASE
Show details
11
Efficient Exploration of Translation Variants in Large Multiparallel Corpora Using a Relational Database
In: Graën, Johannes; Clematide, Simon; Volk, Martin (2016). Efficient Exploration of Translation Variants in Large Multiparallel Corpora Using a Relational Database. In: 4th Workshop on the Challenges in the Management of Large Corpora, Portorož, 28 May 2016 - 28 May 2016, 20-23. (2016)
BASE
Show details
12
Multilingwis – A Multilingual Search Tool for Multi-Word Units in Multiparallel Corpora
In: Clematide, Simon; Graën, Johannes; Volk, Martin (2016). Multilingwis – A Multilingual Search Tool for Multi-Word Units in Multiparallel Corpora. In: Corpas Pastor, Gloria. Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives/Fraseología computacional y basada en corpus: perspectivas monolingües y multilingües. Geneva: Tradulex, n/a. (2016)
BASE
Show details
13
Crowdsourcing Swiss Dialect Transcriptions for Assessing Factors in Writing Variations
In: Clematide, Simon; Frick, Karina; Aepli, Noëmi; Goldman, Jean-Philippe (2016). Crowdsourcing Swiss Dialect Transcriptions for Assessing Factors in Writing Variations. In: Proceedings of the 13th Conference on Natural Language Processing (KONVENS) Bochum, Germany September 19–21, 2016, Bochum, 19 September 2016 - 21 September 2016, 62-67. (2016)
BASE
Show details
14
Bi-particle adverbs, PoS-tagging and the recognition of german separable prefix verbs
In: Volk, Martin; Clematide, Simon; Graën, Johannes; Ströbel, Phillip (2016). Bi-particle adverbs, PoS-tagging and the recognition of german separable prefix verbs. In: KONVENS 2016, Bochum, 19 September 2016 - 21 September 2016. (2016)
BASE
Show details
15
Reflections and a Proposal for a Query and Reporting Language for Richly Annotated Multiparallel Corpora
In: Clematide, Simon (2015). Reflections and a Proposal for a Query and Reporting Language for Richly Annotated Multiparallel Corpora. In: Gintare, Grigonyte; Clematide, Simon; Utka, Andrius; Volk, Martin. Proceedings of the Workshop on Innovative Corpus Query and Visualization Tools at NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania. Linköping, Sweden: Linköping University Electronic Press, Linköpings universitet, 6-16. (2015)
BASE
Show details
16
Ontogene Term and Relation Recognition for CDR
In: Ellendorff, Tilia Renate; Clematide, Simon; van der Lek, Adrian; Furrer, Lenz; Rinaldi, Fabio (2015). Ontogene Term and Relation Recognition for CDR. In: BioCreative V, Sevilla, 9 September 2015 - 11 September 2015, 305-310. (2015)
BASE
Show details
17
Challenges in the alignment, management and exploitation of large and richly annotated multi-parallel corpora
In: Graën, Johannes; Clematide, Simon (2015). Challenges in the alignment, management and exploitation of large and richly annotated multi-parallel corpora. In: 3rd Workshop on the Challenges in the Management of Large Corpora, Lancaster, 20 July 2015 - 20 July 2015, 15-20. (2015)
BASE
Show details
18
Track 4 Overview: Extraction of Causal Network Information in Biological Expression Language (BEL)
In: Fluck, Juliane; Madan, Sumit; Ellendorff, Tilia Renate; Mevissen, Theo; Clematide, Simon; van der Lek, Adrian; Rinaldi, Fabio (2015). Track 4 Overview: Extraction of Causal Network Information in Biological Expression Language (BEL). In: BioCreative V, Sevilla, 9 September 2015 - 11 September 2015, 333-346. (2015)
BASE
Show details
19
Tagging Complex Non-Verbal German Chunks with Conditional Random Fields
In: Roth, Luzia; Clematide, Simon (2014). Tagging Complex Non-Verbal German Chunks with Conditional Random Fields. In: Proceedings of the 12th Edition of the KONVENS Converence, Hildesheim, Germany, October 8-10, 2014, Hildesheim, Germany, 8 October 2014 - 10 October 2014, 48-57. (2014)
BASE
Show details
20
How preferred are preferred terms?
In: Grigonyte, Gintare; Clematide, Simon; Rinaldi, Fabio (2013). How preferred are preferred terms? In: Kosem, I; Kallas, J; Gantar, P; Krek, S; Langemets, M; Tuulik, M. Electronic lexicography in the 21st century: thinking outside the paper. Proceedings of the eLex 2013 conference, 17-19 October 2013, Tallinn, Estonia. Ljubljana/Tallinn: eLex, 452-459. (2013)
BASE
Show details

Page: 1 2

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
30
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern