1 |
From bag-of-words towards natural language: adapting topic models to avoid stop word removal ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Neuronale maschinelle Übersetzung für ressourcenarme Szenarien ... : Neural machine translation for low-resource scenarios ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Linked Open Tafsir - Rekonstruktion der Entstehungsdynamik(en) des Korans mithilfe der Netzwerkmodellierung früher islamischer Überlieferungen ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Linked Open Tafsir - Rekonstruktion der Entstehungsdynamik(en) des Korans mithilfe der Netzwerkmodellierung früher islamischer Überlieferungen ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Evaluation computergestützter Verfahren der Emotionsklassifikation für deutschsprachige Dramen um 1800 ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Evaluation computergestützter Verfahren der Emotionsklassifikation für deutschsprachige Dramen um 1800 ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Preparing Legal Documents for NLP Analysis: Improving the Classification of Text Elements by Using Page Features
|
|
|
|
Abstract:
Legal documents often have a complex layout with many different headings, headers and footers, side notes, etc. For the further processing, it is important to extract these individual components correctly from a legally binding document, for example a signed PDF. A common approach to do so is to classify each (text) region of a page using its geometric and textual features. This approach works well, when the training and test data have a similar structure and when the documents of a collection to be analyzed have a rather uniform layout. We show that the use of global page properties can improve the accuracy of text element classification: we first classify each page into one of three layout types. After that, we can train a classifier for each of the three page types and thereby improve the accuracy on a manually annotated collection of 70 legal documents consisting of 20,938 text elements. When we split by page type, we achieve an improvement from 0.95 to 0.98 for single-column pages with left marginalia and from 0.95 to 0.96 for double-column pages. We developed our own feature-based method for page layout detection, which we benchmark against a standard implementation of a CNN image classifier. The approach presented here is based on corpus of freely available German contracts and general terms and conditions. Both the corpus and all manual annotations are made freely available. The method is language agnostic.
|
|
Keyword:
Automatische Klassifikation; Bilderkennung; ddc:020; Dokumentanalyse; Maschinelles Lernen; Rechtswissenschaften; Sachtext; Text Mining
|
|
URL: https://serwiss.bib.hs-hannover.de/files/2161/csit120102.pdf https://serwiss.bib.hs-hannover.de/frontdoor/index/index/docId/2161 http://nbn-resolving.org/urn:nbn:de:bsz:960-opus4-21618 https://doi.org/10.25968/opus-2161 https://nbn-resolving.org/urn:nbn:de:bsz:960-opus4-21618
|
|
BASE
|
|
Hide details
|
|
8 |
DaF an öffentlichen Schulen am Beispiel eines Projekts in Rio de Janeiro
|
|
|
|
In: Pandaemonium Germanicum: Revista de Estudos Germanísticos, Vol 25, Iss 45 (2022) (2022)
|
|
BASE
|
|
Show details
|
|
10 |
Lockdown, Homeschooling und Social Distancing: der Zweitspracherwerb unter akut veränderten Bedingungen der COVID-19-Pandemie ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Dramapädagogik-Tage 2019. Conference proceedings of the 5th annual conference on performative language teaching and learning ... : Drama in education days 2019 ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
MEDIZINISCHES ENGLISCH LERNEN DURCH AUTHENTISCHE FILME: EXTRA-SPRACHLICHE FAKTOREN, DIE ZUM STUDIUM DER MEDIZINISCHEN TERMINOLOGIE BEITRAGEN ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
MEDIZINISCHES ENGLISCH LERNEN DURCH AUTHENTISCHE FILME: EXTRA-SPRACHLICHE FAKTOREN, DIE ZUM STUDIUM DER MEDIZINISCHEN TERMINOLOGIE BEITRAGEN ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Legitime Sprachen, legitime Identitäten. Interaktionsanalysen im spätmodernen »Deutsch als Fremdsprache«-Klassenzimmer
|
|
Rellstab, Daniel H.. - : transcript, 2021. : Bielefeld, 2021. : pedocs-Dokumentenserver/DIPF, 2021
|
|
In: Bielefeld : transcript 2021, 375 S. - (Interkulturalität. Studien zu Sprache, Literatur und Gesellschaft; 21) (2021)
|
|
BASE
|
|
Show details
|
|
16 |
Dramapädagogik-Tage 2019. Conference proceedings of the 5th annual conference on performative language teaching and learning ; Drama in education days 2019
|
|
|
|
In: 2021, 188 S. (2021)
|
|
BASE
|
|
Show details
|
|
17 |
Lire la littérature médiévale en classe de français langue étrangère: une utopie? ; Reading medieval literature in French lessons: a utopia?
|
|
|
|
In: Schweizerische Zeitschrift für Bildungswissenschaften 43 (2021) 1, S. 129-138 (2021)
|
|
BASE
|
|
Show details
|
|
18 |
Creating a multilingual MOOC content for information literacy: a workflow
|
|
|
|
In: Botte, Alexander [Hrsg.]; Libbrecht, Paul [Hrsg.]; Rittberger, Marc [Hrsg.]: Learning Information Literacy across the Globe. Frankfurt am Main, May 10th 2019. Frankfurt am Main : DIPF 2021, S. 114-128 (2021)
|
|
BASE
|
|
Show details
|
|
19 |
Learning Information Literacy across the Globe. Frankfurt am Main, May 10th 2019
|
|
|
|
In: Frankfurt am Main : DIPF 2021, 133 S. (2021)
|
|
BASE
|
|
Show details
|
|
20 |
Der Beitrag der Interkulturalität zur Vermittlung einer Fremdsprache
|
|
|
|
In: ALTRALANG Journal; Vol 3 No 01 (2021): ALTRALANG Journal Volume: 03 Issue: 01 / July 2021; 222-235 ; 2710-8619 ; 2710-7922 ; 10.52919/altralang.v3i01 (2021)
|
|
BASE
|
|
Show details
|
|
|
|