3 |
WEIR-P: An Information Extraction Pipeline for the Wastewater Domain
|
|
Chahinian, Nanée; Bonnabaud La Bruyère, Thierry; Frontini, Francesca; Delenne, Carole; Julien, Marin; Panckhurst, Rachel; Roche, Mathieu; Deruelle, Laurent; Sautot, Lucile; Teissiere, Maguelonne
|
|
In: RCIS 2021 - 5th International Conference on Research Challenges in Information Science ; https://hal.archives-ouvertes.fr/hal-03211461 ; RCIS 2021 - 5th International Conference on Research Challenges in Information Science, May 2021, Virtual, Cyprus (2021)
|
|
Abstract:
International audience ; We present the MeDO project, aimed at developing resourcesfor text mining and information extraction in the wastewater domain.We developed a specific Natural Language Processing (NLP) pipelinenamed WEIR-P (WastewatEr InfoRmation extraction Platform) which identifies the entities and relations to be extracted from texts, pertaining to network information, wastewater treatment, accidents and works,organizations, spatio-temporal information, measures and water quality. We present and evaluate the first version of the NLP system which was developed to automate the extraction of the aforementioned annotationfrom texts and its integration with existing domain knowledge. The preliminary results obtained on the Montpellier corpus are encouraging and show how a mix of supervised and rule-based techniques can be used to extract useful information and reconstruct the various phases of theextension of a given wastewater network. While the NLP and Information Extraction (IE) methods used are state of the art, the novelty of our work lies in their adaptation to the domain, and in particular in the wastewater management conceptual model, which defines the relations between entities. French resources are less developed in the NLP community than English ones. The datasets obtained in this project are another original aspect of this work.
|
|
Keyword:
[INFO]Computer Science [cs]; [SCCO.LING]Cognitive science/Linguistics; [SDU.STU.HY]Sciences of the Universe [physics]/Earth Sciences/Hydrology; Domain adapted systems; Extraction d'information IE; Fouille de données textuelles; Information extraction; NER; NLP; Reconnaissance d'Entités Nommées (REN); Réseaux d'assainissement; TALN Traitement Automatique des Langues Naturelles; Text mining; Wastewater
|
|
URL: https://hal.archives-ouvertes.fr/hal-03211461/document https://hal.archives-ouvertes.fr/hal-03211461 https://hal.archives-ouvertes.fr/hal-03211461/file/RCIS_MeDO.pdf
|
|
BASE
|
|
Hide details
|
|
4 |
WEIR-P: An Information Extraction Pipeline for the Wastewater Domain
|
|
|
|
In: EGU General Assembly 2021 ; https://hal.archives-ouvertes.fr/hal-03161715 ; EGU General Assembly 2021, Apr 2021, Virtual, France. ⟨10.5194/egusphere-egu21-2708⟩ ; https://meetingorganizer.copernicus.org/EGU21/EGU21-2708.html (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Multilingual comparable corpora of parliamentary debates ParlaMint 2.1
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.1
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.0
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Multilingual comparable corpora of parliamentary debates ParlaMint 2.0
|
|
|
|
BASE
|
|
Show details
|
|
10 |
D3.9 Report on Ontology and Vocabulary Collection and Publication ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
D3.9 Report on Ontology and Vocabulary Collection and Publication ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
4 Evolving interactional practices of emoji in text messages
|
|
|
|
In: Visualizing Digital Discourse ; https://hal.archives-ouvertes.fr/hal-03565868 ; Visualizing Digital Discourse, De Gruyter, pp.81-104, 2020, ⟨10.1515/9781501510113-005⟩ (2020)
|
|
BASE
|
|
Show details
|
|
13 |
Named Entity Recognition for Distant Reading in ELTeC
|
|
|
|
In: Proceedings CLARIN Annual Conference 2020 ; CLARIN Annual Conference 2020 ; https://hal.archives-ouvertes.fr/hal-03160438 ; CLARIN Annual Conference 2020, Oct 2020, Virtual Event, France (2020)
|
|
BASE
|
|
Show details
|
|
14 |
The Guidelines for Annotating Named Entities: similarities, differences and difficulties ; Les Guides d’annotation des entités nommées : similitudes, différences, difficultés
|
|
|
|
In: Humanistica 2020 ; https://hal.archives-ouvertes.fr/hal-02880101 ; Humanistica 2020, May 2020, Bordeaux, France ; http://www.humanisti.ca/colloque2020/ (2020)
|
|
BASE
|
|
Show details
|
|
15 |
Nénufar: Modelling a Diachronic Collection of Dictionary Editions as a Computational Lexical Resource
|
|
|
|
In: ELEX 2019: smart lexicography ; https://hal.inria.fr/hal-02272978 ; ELEX 2019: smart lexicography, Oct 2019, Sintra, Portugal (2019)
|
|
BASE
|
|
Show details
|
|
16 |
Le projet Nénufar, nouvelle édition numérique du Petit Larousse illustré (1906-1948)
|
|
|
|
In: XXIXe congrès international de linguistique et de philologie romanes ; https://hal.archives-ouvertes.fr/hal-03262248 ; XXIXe congrès international de linguistique et de philologie romanes, Société de Linguistique Romane, Jul 2019, Copenhague, Danemark (2019)
|
|
BASE
|
|
Show details
|
|
17 |
Adapting a system for Named Entity Recognition and Linking for 19th century French Novels
|
|
|
|
In: Digital Humanities 2019 ; https://hal.archives-ouvertes.fr/hal-02187283 ; Digital Humanities 2019, Jul 2019, Utrecht, Netherlands. 2019 ; https://dev.clariah.nl/files/dh2019/boa/0904.html (2019)
|
|
BASE
|
|
Show details
|
|
18 |
Vers une ontologie de la nomination et de la référence dédiée à l'annotation des textes
|
|
|
|
In: 13rd Terminology & Ontology: Theories and applications (TOTh) International Conference ; https://hal.archives-ouvertes.fr/hal-02269154 ; 13rd Terminology & Ontology: Theories and applications (TOTh) International Conference, Jun 2019, Chambéry, France (2019)
|
|
BASE
|
|
Show details
|
|
19 |
Vers une ontologie de la nomination et de la référence dédiée à l'annotation des textes
|
|
|
|
In: 13rd Terminology & Ontology: Theories and applications (TOTh) International Conference ; https://hal.archives-ouvertes.fr/hal-02269154 ; 13rd Terminology & Ontology: Theories and applications (TOTh) International Conference, Jun 2019, Chambéry, France (2019)
|
|
BASE
|
|
Show details
|
|
20 |
Approaching French theatrical characters by syntactical analysis: a study with motifs and correspondence analysis
|
|
|
|
In: The Grammar of Genres and Styles. From Discrete to Non-Discrete Units ; https://hal.archives-ouvertes.fr/hal-03482615 ; Dominique Legallois; Thierry Charnois; Meri Larjavaara. The Grammar of Genres and Styles. From Discrete to Non-Discrete Units, 320, De Gruyter Mouton, pp.118-139, 2018, Trends in Linguistics. Studies and Monographs [TiLSM], 978-3-11-058968-9. ⟨10.1515/9783110595864-006⟩ (2018)
|
|
BASE
|
|
Show details
|
|
|
|