DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5...7
Hits 1 – 20 of 130

1
Preparing Legal Documents for NLP Analysis: Improving the Classification of Text Elements by Using Page Features
Josi, Frieda; Wartena, Christian (Prof. Dr.); Heid, Ulrich. - : AIRCC Publishing Corporation, 2022. : Hannover : Hochschule Hannover, 2022
Abstract: Legal documents often have a complex layout with many different headings, headers and footers, side notes, etc. For the further processing, it is important to extract these individual components correctly from a legally binding document, for example a signed PDF. A common approach to do so is to classify each (text) region of a page using its geometric and textual features. This approach works well, when the training and test data have a similar structure and when the documents of a collection to be analyzed have a rather uniform layout. We show that the use of global page properties can improve the accuracy of text element classification: we first classify each page into one of three layout types. After that, we can train a classifier for each of the three page types and thereby improve the accuracy on a manually annotated collection of 70 legal documents consisting of 20,938 text elements. When we split by page type, we achieve an improvement from 0.95 to 0.98 for single-column pages with left marginalia and from 0.95 to 0.96 for double-column pages. We developed our own feature-based method for page layout detection, which we benchmark against a standard implementation of a CNN image classifier. The approach presented here is based on corpus of freely available German contracts and general terms and conditions. Both the corpus and all manual annotations are made freely available. The method is language agnostic.
Keyword: Automatische Klassifikation; Bilderkennung; ddc:020; Dokumentanalyse; Maschinelles Lernen; Rechtswissenschaften; Sachtext; Text Mining
URL: https://serwiss.bib.hs-hannover.de/files/2161/csit120102.pdf
https://serwiss.bib.hs-hannover.de/frontdoor/index/index/docId/2161
http://nbn-resolving.org/urn:nbn:de:bsz:960-opus4-21618
https://doi.org/10.25968/opus-2161
https://nbn-resolving.org/urn:nbn:de:bsz:960-opus4-21618
BASE
Hide details
2
Representing Standard Text Formulations as Directed Graphs
Josi, Frieda; Wartena, Christian (Prof. Dr.); Heid, Ulrich (Prof. Dr.). - : Cham : Springer, 2021. : Hannover : Hochschule Hannover, 2021
BASE
Show details
3
Detecting Paraphrases of Standard Clause Titles in Insurance Contracts
Heid, Ulrich; Wartena, Christian (Prof. Dr.); Josi, Frieda. - : Hannover : Hochschule Hannover, 2019
BASE
Show details
4
A taxonomy of user guidance devices for e-lexicography
In: Lexicographica. Internationales Jahrbuch für Lexikographie. International annual for lexicography. Revue internationale de lexicographie 33 (2018), 391-422
IDS OBELEX meta
Show details
5
Semi-automating the Reading Programme for a Historical Dictionary Project
In: Lexikos; Vol. 28 (2018) ; 2224-0039 (2018)
BASE
Show details
6
Direct User Guidance in e-Dictionaries for Text Production and Text Reception - The Verbal Relative in Sepedi as a Case Study
In: Lexikos. Journal of the African Association for Lexicography 27 (2017), 403-426
IDS OBELEX meta
Show details
7
Direct User Guidance in e-Dictionaries for Text Production and Text Reception — The Verbal Relative in Sepedi as a Case Study
In: Lexikos; Vol. 27 (2017) ; 2224-0039 (2017)
BASE
Show details
8
Enabling Selective Queries and Adapting Data Display in the Electronic Version of a Historical Dictionary
In: Proceedings of the 17th EURALEX International Congress: Lexicography and Linguistic Diversity. Tbilisi, Georgia 6 - 10 September 2016 (2016), 635-646
IDS OBELEX meta
Show details
9
French Specialised Medical Constructions: Lexicographic Treatment and Corpus Coverage in General and Specialised Dictionaries
In: Proceedings of the 17th EURALEX International Congress: Lexicography and Linguistic Diversity. Tbilisi, Georgia 6 - 10 September 2016 (2016), 521-528
IDS OBELEX meta
Show details
10
Recent Initiatives towards New Standards for Language Resources
In: GSCL 2015: Proceedings of the Int. Conference of the German Society for Computational Linguistics and Language Technology, University of Duisburg-Essen, Germany, Sep 30-Oct 2, 2015 (2015), 154-156
IDS Bibliografie zur Gesprächsforschung
Show details
11
Corpora
In: Word-Formation. An International Handbook of the Languages of Europe. Volume 3 (2015), 2354-2371
IDS Bibliografie zur deutschen Grammatik
Show details
12
Recent Initiatives towards New Standards for Language Resources
In: International Conference of the German Society for Computational Linguistics and Language Technology ; https://hal.inria.fr/hal-01464476 ; International Conference of the German Society for Computational Linguistics and Language Technology, Sep 2015, Essen, Germany (2015)
BASE
Show details
13
Natural Language Processing Techniques for Improved User-friendliness of Electronic Dictionaries
In: Proceedings of the 16th EURALEX International Congress: The User in Focus, Bolzano/Bozen, Italien 15 - 19 July 2014 (2014), 47-61
IDS OBELEX meta
Show details
14
User Support in e-Dictionaries for Complex Grammatical Structures in the Bantu Languages
In: Proceedings of the 16th EURALEX International Congress: The User in Focus, Bolzano/Bozen, Italien 15 - 19 July 2014 (2014), 819-827
IDS OBELEX meta
Show details
15
From to ISOTiger – Community Driven Developments for Syntax Annotation in SynAF
In: Treebanks and Linguistic Theories (TLT) ; https://hal.inria.fr/hal-01085219 ; Treebanks and Linguistic Theories (TLT), Dec 2014, Tübingen, Germany ; http://tlt13.sfs.uni-tuebingen.de (2014)
BASE
Show details
16
Resource interoperability revisited
BASE
Show details
17
Ongoing work on e-lexicography in the SeLA project
In: Lexicographica. Internationales Jahrbuch für Lexikographie. International annual for lexicography. Revue internationale de lexicographie 29 (2013), 329-331
IDS OBELEX meta
Show details
18
Workbenches for corpus-based lexicography
In: Wörterbücher. Dictionaries. Dictionnaires: Ein internationales Handbuch zur Lexikographie. An International Encyclopedia of Lexicography. Encyclopédie international de lexicographie (HSK 5.4) (2013), 1455-1460
IDS OBELEX meta
Show details
19
The impact of computational lexicography
In: Wörterbücher. Dictionaries. Dictionnaires: Ein internationales Handbuch zur Lexikographie. An International Encyclopedia of Lexicography. Encyclopédie international de lexicographie (HSK 5.4) (2013), 24-30
IDS OBELEX meta
Show details
20
Design criteria and 'added value' of electronic dictionaries for human users
In: Wörterbücher. Dictionaries. Dictionnaires: Ein internationales Handbuch zur Lexikographie. An International Encyclopedia of Lexicography. Encyclopédie international de lexicographie (HSK 5.4) (2013), 1001-1013
IDS OBELEX meta
Show details

Page: 1 2 3 4 5...7

Catalogues
15
4
9
0
0
2
2
Bibliographies
57
0
3
1
0
0
24
0
5
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
22
0
3
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern