1 |
Model-driven Web Page Segmentation for Non Visual Access
|
|
|
|
In: 16th International Conference of the Pacific Association for Computational Linguistics (PACLING 2019) ; https://hal.archives-ouvertes.fr/hal-02309612 ; 16th International Conference of the Pacific Association for Computational Linguistics (PACLING 2019), Oct 2019, Hanoï City, Vietnam (2019)
|
|
Abstract:
International audience ; Web page segmentation aims to break a large page into smaller blocks, in which contents with coherent semantics are kept together. Within this context, a great deal of approaches have been proposed without any specific end task in mind. In this paper, we study different segmentation strategies for the task of non visual skimming. For that purpose, we propose to segment web pages into visually coherent zones so that each zone can be represented by a set of relevant keywords that can be further synthesized into concurrent speech. As a consequence, we consider web page segmentation as a clustering problem of visual elements, where (1) a fixed number of clusters must be discovered, (2) the elements of a cluster should be visually connected and (3) all visual elements must be clustered. Therefore , we study variations of three existing algorithms, that comply to these constraints: K-means, F-K-means, and Guided Expansion. In particular, we evaluate different reading strategies for the positioning of the initial K seeds as well as a pre-clustering methodology for the Guided Expansion algorithm, which goal is to (1) fasten the clustering process and (2) reduce unbalance between clusters. The performed evaluation shows that the Guided Expansion algorithm evidences statistically increased results over the two other algorithms with the variations of the reading strategies. Nevertheless, improvements still need to be proposed to increase separateness.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; [INFO.INFO-WB]Computer Science [cs]/Web; [INFO]Computer Science [cs]
|
|
URL: https://hal.archives-ouvertes.fr/hal-02309612/document https://hal.archives-ouvertes.fr/hal-02309612/file/PACLING_2019_Model-Driven-Web-Page-Segmentation.pdf https://hal.archives-ouvertes.fr/hal-02309612
|
|
BASE
|
|
Hide details
|
|
2 |
Identifying Temporal Orientation of Word Senses Based on Minimum Cuts
|
|
|
|
In: The 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL 2016) ; https://hal.archives-ouvertes.fr/hal-01702812 ; The 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL 2016), Aug 2016, Berlin, Germany. pp.22 - 30 (2016)
|
|
BASE
|
|
Show details
|
|
3 |
Verses and measures: detection of vowel nuclei ; Des vers et des mesures : détection des noyaux vocaliques
|
|
|
|
In: ISSN: 0458-726X ; EISSN: 1958-9549 ; Langages ; https://hal.archives-ouvertes.fr/hal-01380142 ; Langages, Armand Colin (Larousse jusqu'en 2003), 2015, Traitement automatique des textes versifiés : problématiques et pratiques, pp.107-124. ⟨10.3917/lang.199.0107⟩ (2015)
|
|
BASE
|
|
Show details
|
|
4 |
Identification of Shell Nouns, Signals of Discourse Organisation ; Identification des noms sous-spécifiés, signaux de l’organisation discursive
|
|
|
|
In: Proceedings of TALN 2014 (Volume 1: Long Papers) ; 21ème conférence sur le Traitement Automatique des Langues Naturelles ; https://hal.archives-ouvertes.fr/hal-01076760 ; 21ème conférence sur le Traitement Automatique des Langues Naturelles, Jul 2014, Marseille, France. pp.377-388 ; https://www.aclweb.org/anthology/F14-1033 (2014)
|
|
BASE
|
|
Show details
|
|
5 |
Propagation Strategies for Building Temporal Ontologies
|
|
|
|
In: 14th Conference of the European Chapter of the Association for Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-01074969 ; 14th Conference of the European Chapter of the Association for Computational Linguistics, Apr 2014, Gotenburg, Sweden. pp.PP6-11 (2014)
|
|
BASE
|
|
Show details
|
|
6 |
Intensité et polarité : un modèle opératoire articulant plusieurs travaux linguistiques
|
|
|
|
In: ISSN: 0023-8368 ; EISSN: 1957-7982 ; Langue française ; https://hal.archives-ouvertes.fr/hal-01123696 ; Langue française, Armand Colin, 2014, Études sur l'évaluation axiologique, 4/2014 (184), p. 33-52. ⟨10.3917/lf.184.0035⟩ ; http://www.armand-colin.com/ (2014)
|
|
BASE
|
|
Show details
|
|
7 |
Rhetorical Browzing in Journalistic Texts: Preliminary Investigations
|
|
|
|
In: Proceedings of the 2013 Federated Conference on Computer Science and Information Systems ; https://hal.archives-ouvertes.fr/hal-01074490 ; Proceedings of the 2013 Federated Conference on Computer Science and Information Systems, IEEE, 2013, pp. 251-256 (2013)
|
|
BASE
|
|
Show details
|
|
8 |
Personalized Semantic Resources: The SemComp Project Presentation and Preliminary Works
|
|
|
|
In: International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (KEOD 2013) ; https://hal.archives-ouvertes.fr/hal-01073599 ; International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (KEOD 2013), Sep 2013, vilamoura, Portugal. ⟨10.5220/0004539501640169⟩ (2013)
|
|
BASE
|
|
Show details
|
|
9 |
Opinion analysis: the effect of negation on polarity and intensity
|
|
|
|
In: Proceedings of KONVENS workhop PATHOS - 1st Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis ; KONVENS workhop PATHOS - 1st Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis ; https://hal.archives-ouvertes.fr/hal-01071601 ; KONVENS workhop PATHOS - 1st Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis, Sep 2012, Vienne, Austria. pp.282-290 (2012)
|
|
BASE
|
|
Show details
|
|
10 |
Opinion Mining in an Informative Corpus: Building Lexicons
|
|
|
|
In: Proceedings of KONVENS workhop PATHOS - 1st Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis ; KONVENS workhop PATHOS - 1st Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis ; https://hal.archives-ouvertes.fr/hal-01071162 ; KONVENS workhop PATHOS - 1st Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis, 2012, vienne, Austria. pp.314-318 (2012)
|
|
BASE
|
|
Show details
|
|
11 |
Pour une recherche d'information et une veille juridique interactives et socio centrées. ENT énactif & veille en droit du transport.
|
|
|
|
In: ISSN: 1633-1311 ; EISSN: 1633-1311 ; Revue des Sciences et Technologies de l'Information - Série ISI : Ingénierie des Systèmes d'Information ; https://hal.archives-ouvertes.fr/hal-01074200 ; Revue des Sciences et Technologies de l'Information - Série ISI : Ingénierie des Systèmes d'Information, Lavoisier, 2012, 2, p. 17-40. ⟨10.3166/ISI.17.2.17-40⟩ (2012)
|
|
BASE
|
|
Show details
|
|
12 |
Vers une analyse automatique des discours évaluatifs. Le cas des constituants détachés "en N "
|
|
|
|
In: Linguistic and Psycholinguistic Approaches to Text Structuring ; https://hal.archives-ouvertes.fr/hal-01016533 ; Linguistic and Psycholinguistic Approaches to Text Structuring, 2009, Paris, France (2009)
|
|
BASE
|
|
Show details
|
|
13 |
Jugements d'évaluation et constituants périphériques
|
|
|
|
In: Actes de la 16ème conférence sur le traitement automatique des langues naturelles (TALN'09) ; https://hal.archives-ouvertes.fr/hal-01011828 ; Actes de la 16ème conférence sur le traitement automatique des langues naturelles (TALN'09), Jun 2009, Senlis, France, France. 10 p., actes électroniques (2009)
|
|
BASE
|
|
Show details
|
|
14 |
User-Centered Analysis of Corpora using Semantic Features Redundancy
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-00203565 ; 2008 (2008)
|
|
BASE
|
|
Show details
|
|
15 |
Tracking evaluation in discourse
|
|
|
|
In: Proceedings of Workshop at EUROLAN 2007 on Applications of Semantics, Opinions and Sentiments (ASOS'07) ; Workshop at EUROLAN 2007 on Applications of Semantics, Opinions and Sentiments (ASOS'07) ; https://hal.archives-ouvertes.fr/hal-00410753 ; Workshop at EUROLAN 2007 on Applications of Semantics, Opinions and Sentiments (ASOS'07), Jul 2007, Iasi, Romania (2007)
|
|
BASE
|
|
Show details
|
|
16 |
Suivi d'opinion dans le discours
|
|
|
|
In: Revue électronique Textes et corpus ; Texte et corpus ; 5eme Journées de la Linguistique de corpus ; https://hal.archives-ouvertes.fr/hal-00410751 ; 5eme Journées de la Linguistique de corpus, Sep 2007, Lorient, France. pp.103-114 (2007)
|
|
BASE
|
|
Show details
|
|
|
|