1 |
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus
|
|
|
|
In: https://hal.inria.fr/hal-03536361 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Lexicographic Data Seal of Compliance
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03344267 ; [Research Report] ELEXIS; DARIAH. 2021 (2021)
|
|
BASE
|
|
Show details
|
|
3 |
Building, Encoding, and Annotating a Corpus of Parliamentary Debates in XML-TEI: A Cross-Linguistic Account
|
|
|
|
In: https://halshs.archives-ouvertes.fr/halshs-03097333 ; 2020 (2020)
|
|
BASE
|
|
Show details
|
|
4 |
CamemBERT: a Tasty French Language Model
|
|
|
|
In: https://hal.inria.fr/hal-02445946 ; 2019 (2019)
|
|
BASE
|
|
Show details
|
|
5 |
From disparate disciplines to unity in diversity. How the PARTHENOS project brings Humanities Research Infrastructures together ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
From disparate disciplines to unity in diversity. How the PARTHENOS project brings Humanities Research Infrastructures together ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Automatic TEI encoding of manuscripts catalogues with GROBID-Dictionaries ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Automatic TEI encoding of manuscripts catalogues with GROBID-Dictionaries ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Open Access in Japan – a multi-institutional perspective
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-01290936 ; [Research Report] Ambassade de France au Japon. 2016 (2016)
|
|
BASE
|
|
Show details
|
|
12 |
IPERION CH Data Management Plan
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-02139658 ; [Research Report] D 2.1, Inria. 2015 (2015)
|
|
BASE
|
|
Show details
|
|
14 |
Pepper: Handling A Multiverse Of Formats ...
|
|
|
|
Abstract:
With the rising importance of empirical data in many fields of linguistic research, we see an increase not only in the amount of electronically available corpora, but also in the number of tools used to make this data accessible, processable and searchable. Most of these tools have been developed in the course of specific linguistic projects and therefore can only handle a certain kinds of linguistic information, such as syntactic-structures (e.g. TIGERSearch, Lezius 2002), or dialogue-structures (e.g. EXMARaLDA, Schmidt 2004) etc. At the same time, each tool uses its own, proprietary format for representing the text and its annotations. Such formats are optimized for a specific kind of analysis and the performance of a specific processing tool. Consequently they cannot easily be mapped onto each other. This impedes those linguistic research questions which pre-suppose a global view on data, i.e., which require the option to correlate, query and analyze several kinds of linguistic annotations at once. We ... : {"references": ["Dipper S. (2005). XML-based Stand-off Representation and Exploitation of Multi-Level", "Linguistic Annotation. In: Eckstein R., Tolksdorf R. (eds.) Berliner XML Tage.", "Ide N.& Suderman K.(2007). GrAF: A Graph-based Format for Linguistic Annotations.", "In: Proceedings of the Linguistic Annotation Workshop, Prague, Czech Republic.", "Lezius W. (2002) Ein Suchwerkzeug f\u00fcr syntaktisch annotierte Textkorpora. Ph.D. thesis,", "Stuttgart University.", "M\u00fcller C. & Strube M. (2006). Multi-Level Annotation of Linguistic Data with MMAX2. In:", "Braun S. & Kohn K. & Mukherjee J.(eds.), Corpus Technology and Language", "Pedagogy. Frankfurt: Peter Lang, 197\u2013214.", "Pajas P. & \u0160t\u011bp\u00e1nek J. (2008). Recent Advances in a Feature-Rich Framework for Treebank", "Annotation. In: Proceedings of the 22nd International Conference on Computational", "Linguistics. Manchester, 673-680.", "Schmidt T. (2004). Transcribing and Annotating Spoken Language with Exmaralda. ...
|
|
Keyword:
conversion; conversion framework; converter; corpus; EXMARaLDA; format; linguistic data; Pepper; plugin; Salt; TIGER XML
|
|
URL: https://dx.doi.org/10.5281/zenodo.15638 https://zenodo.org/record/15638
|
|
BASE
|
|
Hide details
|
|
15 |
[Tiger2/] Documentation
|
|
|
|
In: https://hal.inria.fr/inria-00593903 ; [Technical Report] 2010 (2010)
|
|
BASE
|
|
Show details
|
|
16 |
HANDLING MULTILINGUAL CONTENT IN DIGITAL MEDIA: A CRITICAL ANALYSIS
|
|
|
|
In: https://hal.inria.fr/inria-00001120 ; [Research Report] 2006, pp.60 (2006)
|
|
BASE
|
|
Show details
|
|
17 |
Unification of multi-lingual scientific terminological resources using the ISO 16642 standard. The TermSciences initiative ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Towards Multimodal Content Representation
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-00323338 ; 2002 (2002)
|
|
BASE
|
|
Show details
|
|
19 |
The ELAN Architecture ; The ELAN Architecture: ELAN Deliverables WP3
|
|
|
|
In: https://hal.inria.fr/hal-01875371 ; [Contract] Deliverables D3.1-1 and D3.2-1, Inria. 1999 (1999)
|
|
BASE
|
|
Show details
|
|
20 |
A cognitive model for the representation of time in a man-machine dialogue.
|
|
|
|
In: https://hal.inria.fr/hal-00721871 ; 1989 (1989)
|
|
BASE
|
|
Show details
|
|
|
|