Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5...159

Hits 1 – 20 of 3.178

1	Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
	Mielke, Sabrina J.; Alyafeai, Zaid; Salesky, Elizabeth...
	In: https://hal.inria.fr/hal-03540069 ; 2022 (2022)
	BASE
	Show details

2	A fine-grained recognition of Named Entities in ELTeC collection using cascades
	Krstev, Cvetana; Maurel, Denis; Stanković, Ranka
	In: Final Action Event of COST Action Distant Reading for European Literary History ; https://hal.archives-ouvertes.fr/hal-03615219 ; Final Action Event of COST Action Distant Reading for European Literary History, Christof Schöch, Apr 2022, Krakow, Poland ; https://www.distant-reading.net/events/conference-programme/ (2022)
	Abstract: International audience ; In the scope of the COST action “Distant Reading for European Literary History” (Schöch et al. 2021; Patras et al. 2021) the working group 2 (WG2) responsible for methods and tools suggested a set of seven named entity (NE) categories to be used for annotating novels (the so-called “level-2” text version). Tags to be used for this set are: PERS, LOC, ORG, WORK, EVENT, ROLE, DEMO (Frontini et. al 2020; Šandrih Todorović et al. 2021). The level-2 version of Serbian novels was produced using this set of categories and tags (Krstev et al. 2019).For Serbian and French the fine-grained named entity recognition systems were developed based on exhaustive lexicons of corresponding languages and rules implemented in the form of cascades of finite-state automata (Maurel and Friburger 2014; Krstev et al. 2014). These systems were developed using the open-source corpus processing suite Unitex/GramLab and its module CasSys. Both systems recognize and tag a rich set of NE categories and subcategories and allow entity embedding; moreover, the French system recognizes NEs that correspond to TEI guidelines, chapter 13 (TEI P5). An example that illustrates this in Frenchis (Marquis de la Lande factories): usines de laLande Similarly, in Serbian (Queen Elizabeth of Hungary): kraljice Ugarske Elizabete Moreover, both systems recognize beside broad categories suggested by WG2 the other categories such as temporal or measurement expressions.In both Serbian and French systems, the recognition module is separated from the annotation module, which enables production of output as needed. In this paper we will illustrate this on a few Serbian and French novels from ELTeC corpus chosen to match in respect to corpus balance criteria, namely author’s gender, novel’s size, year of first publication. The novels will be annotated with the simplified tags needed for level-2 text format, and with more elaborate TEI compliant tags that reflect all nuances of recognized NEs.Two output formats for Serbian and French novels will be uploaded into TXM corpus processing systems which will enable both quantitative and qualitative analysis (Krstev et al., 2019). Besides statistical analysis of annotated NER, we will perform contrastive analysis of Serbian and French NEs and for both languages between fine-grained and simplified versions of annotation. The qualitative analysis will reveal interesting examples of annotation, open issues and hard cases. Textometrie analysis in TXM will be illustrated for both fine-grained and simplified versions of annotated samples.Finally, we will go back to the research questions that were posed by Action’s working group 3 (literary theory and history) when the Action started. Namely the first idea and wish of the WG3 was to produce fine grained annotations that will allow, for instance, distinction between cities and villages, different person’s roles (professions, family relations, etc.), person’s gender, types of locations (continent, country, region, city, village, mountain, waterbody, astronym), etc. After the analysis of availability of NER tools, the fine-grained approach was substituted with a much simpler schema. With this research we would like to reopen these questions and establish whether it is possible to meet the need for more detailed literary analysis based on Named Entities.
	Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; Digital humanities; Distant Reading for European Literary History; Named entities recognition; Unitex
	URL: https://hal.archives-ouvertes.fr/hal-03615219
	BASE
	Hide details

3	RETRIEVING SPEAKER INFORMATION FROM PERSONALIZED ACOUSTIC MODELS FOR SPEECH RECOGNITION
	Mdhaffar, Salima; Bonastre, Jean-François; Tommasi, Marc...
	In: IEEE ICASSP 2022 ; https://hal.archives-ouvertes.fr/hal-03539741 ; IEEE ICASSP 2022, 2022, Singapour, Singapore (2022)
	BASE
	Show details

4	Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
	McMillan-Major, Angelina; Alyafeai, Zaid; Biderman, Stella...
	In: https://hal.inria.fr/hal-03550289 ; 2022 (2022)
	BASE
	Show details

5	Source or target first? Comparison of two post-editing strategies with translation students
	Volkart, Lise; Girletti, Sabrina; Gerlach, Johanna...
	In: https://hal.archives-ouvertes.fr/hal-03546151 ; 2022 (2022)
	BASE
	Show details

6	Automatic Normalisation of Early Modern French
	Bawden, Rachel; Poinhos, Jonathan; Kogkitsidou, Eleni...
	In: https://hal.inria.fr/hal-03540226 ; 2022 (2022)
	BASE
	Show details

7	Offline Corpus Augmentation for English-Amharic Machine Translation
	Biadgligne, Yohannes; Smaïli, Kamel
	In: 2022 The 5th International Conference on Information and Computer Technologies ; https://hal.archives-ouvertes.fr/hal-03547539 ; 2022 The 5th International Conference on Information and Computer Technologies, Mar 2022, New York, United States (2022)
	BASE
	Show details

8	New Version of a Translater for a Natural Language Study
	Jakubiec-Jamet, Line
	In: https://hal.archives-ouvertes.fr/hal-03551680 ; 2022 (2022)
	BASE
	Show details

9	A Translater from Latex Trees to Coq Trees for a Natural Language Study
	Jakubiec-Jamet, Line
	In: https://hal.archives-ouvertes.fr/hal-03536652 ; 2022 (2022)
	BASE
	Show details

10	From FreEM to D'AlemBERT ; From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French
	Gabay, Simon; Ortiz Suarez, Pedro; Bartz, Alexandre...
	In: Proceedings of the 13th Language Resources and Evaluation Conference ; https://hal.inria.fr/hal-03596653 ; Proceedings of the 13th Language Resources and Evaluation Conference, European Language Resources Association, Jun 2022, Marseille, France (2022)
	BASE
	Show details

11	Integrating a Phrase Structure Corpus Grammar and a Lexical-Semantic Network: the HOLINET Knowledge Graph
	Prost, Jean-Philippe
	In: Proceedings of LREC 2022 ; https://hal-amu.archives-ouvertes.fr/hal-03655636 ; Proceedings of LREC 2022, Jun 2022, Marseille, France (2022)
	BASE
	Show details

12	Linguistic resources for paraphrase generation in Portuguese: a Lexicon-Grammar approach
	Barreiro, Anabela; Mota, Cristina; Baptista, Jorge...
	In: ISSN: 1574-020X ; EISSN: 1574-0218 ; Language Resources and Evaluation ; https://hal.archives-ouvertes.fr/hal-03548861 ; Language Resources and Evaluation, Springer Verlag, 2022, ⟨10.1007/s10579-021-09561-5⟩ ; https://link.springer.com/article/10.1007/s10579-021-09561-5 (2022)
	BASE
	Show details

13	Caveats of Measuring Semantic Change of Cognates and Borrowings using Multilingual Word Embeddings
	Fourrier, Clémentine; Montariol, Syrielle
	In: LChange'22 - 3rd International Workshop on Computational Approaches to Historical Language Change 2022 ; https://hal.inria.fr/hal-03635005 ; LChange'22 - 3rd International Workshop on Computational Approaches to Historical Language Change 2022, May 2022, Dublin, Ireland (2022)
	BASE
	Show details

14	Towards a Cleaner Document-Oriented Multilingual Crawled Corpus
	Abadji, Julien; Ortiz Suarez, Pedro; Romary, Laurent...
	In: https://hal.inria.fr/hal-03536361 ; 2022 (2022)
	BASE
	Show details

15	Preprint Citation Praxis in PLOS
	Bertin, Marc; Atanassova, Iana
	In: ISSN: 0138-9130 ; EISSN: 1588-2861 ; Scientometrics ; https://hal.archives-ouvertes.fr/hal-03506094 ; In press (2022)
	BASE
	Show details

16	Morphology in the Corsican Language Database (BDLC) : assessment and perspectives ; La morphologie dans la Banque de Données Langue Corse : bilan et perspectives
	Retali Medori, Stella; Kevers, Laurent
	In: ISSN: 1638-9808 ; EISSN: 1765-3126 ; Corpus ; https://hal.archives-ouvertes.fr/hal-03591866 ; Corpus, Bases, Corpus, Langage - UMR 7320, 2022, Corpus et données en morpholgie, ⟨10.4000/corpus.7115⟩ ; https://journals.openedition.org/corpus/7115 (2022)
	BASE
	Show details

17	Starting a new treebank? Go SUD! Theoretical and practical benefits of the Surface-Syntactic distributional approach
	Gerdes, Kim; Guillaume, Bruno; Kahane, Sylvain...
	In: Sixth International Conference on Dependency Linguistics (Depling, SyntaxFest 2021) ; https://hal.inria.fr/hal-03509136 ; Sixth International Conference on Dependency Linguistics (Depling, SyntaxFest 2021), Mar 2022, Sofia, Bulgaria (2022)
	BASE
	Show details

18	Assessing the impact of OCR noise on multilingual event detection over digitised documents
	Boros, Emanuela; Nguyen, Nhu Khoa; Lejeune, Gaël...
	In: ISSN: 1432-5012 ; EISSN: 1432-1300 ; International Journal on Digital Libraries ; https://hal.archives-ouvertes.fr/hal-03635985 ; International Journal on Digital Libraries, Springer Verlag, 2022, ⟨10.1007/s00799-022-00325-2⟩ (2022)
	BASE
	Show details

19	Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0
	De Toni, Francesco; Akiki, Christopher; de la Rosa, Javier...
	In: Proceedings of the International Workshop on Challenges & Perspectives in Creating Large Language Models 2022 (BigScience 2022) ; https://hal.inria.fr/hal-03639144 ; Proceedings of the International Workshop on Challenges & Perspectives in Creating Large Language Models 2022 (BigScience 2022), May 2022, Dublin, France (2022)
	BASE
	Show details

20	Évaluation des propriétés multilingues d'un embedding contextualisé
	Gaschi, Félix; Joutard, Alexandre; Rastin, Parisa...
	In: EGC 2022 - Conférence francophone sur l'Extraction et la Gestion des Connaissances ; https://hal.archives-ouvertes.fr/hal-03578480 ; EGC 2022 - Conférence francophone sur l'Extraction et la Gestion des Connaissances, Jan 2022, Blois, France (2022)
	BASE
	Show details

Page: 1 2 3 4 5...159

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern