1 |
Blog annotation: from corpus analysis to automatic tag suggestion
|
|
|
|
In: 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2016) ; https://hal-auf.archives-ouvertes.fr/hal-01358328 ; 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2016), Pascale Fung; Tomas Mikolov; Simone Teufel; Piek Vossen, Apr 2016, Konya, Turkey (2016)
|
|
Abstract:
International audience ; Nowadays, blogs cover a large audience and they raised from the underground to become part of mainstream media. Blogs contain information on diverse topics, personal opinions, and discussions between bloggers and readers. Tags and categories are structural elements of a blog post that increase the blog's visibility, enhance navigation and searching within the blog history. We suppose that those annotations are made on subjective grounds rather than in a systematic way. Even if there are tools to help bloggers to tag and categorize their posts, we still don't know to which extent these tools take into account information contained in previous posts. This paper presents a 11 million word corpus of blogs posts in French dedicated to study these questions, and an experiment in tag and category prediction. Preliminary results show that around 27\% of the overall tags can be predicted from lexical frequency analysis of blog posts. However, a first comparison experience with an existing tag suggestion tool shows that an important proportion of the tags used for blog description are not present in the blog post. This shows that tag suggestion tools should exploit the diachronic analysis of blogs.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; Annotation; Tag suggestion
|
|
URL: https://hal-auf.archives-ouvertes.fr/hal-01358328
|
|
BASE
|
|
Hide details
|
|
2 |
Words in context: a reference perspective on the lexicon ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
One Lexicon, Two Structures: So What Gives?
|
|
|
|
In: Proceedings of the Seventh Global Wordnet Conference (GWC2014) ; Seventh Global Wordnet Conference (GWC2014) ; https://hal.archives-ouvertes.fr/hal-00937187 ; Seventh Global Wordnet Conference (GWC2014), Jan 2014, Tartu, Estonia. pp.163-171 (2014)
|
|
BASE
|
|
Show details
|
|
4 |
GAF: A Grounded Annotation Framework for Events.
|
|
|
|
In: http://newsreader-project.eu/files/2012/12/camera_ready.pdf (2013)
|
|
BASE
|
|
Show details
|
|
5 |
Building the global WordNet grid, in
|
|
|
|
In: http://www.adampease.org/professional/Grid2008.pdf (2012)
|
|
BASE
|
|
Show details
|
|
6 |
Building the global WordNet grid, in
|
|
|
|
In: http://www.vossen.info/docs/2008/BuildingGlobalWordnetGrid_CIL_Workshop_Korea_2008.pdf (2012)
|
|
BASE
|
|
Show details
|
|
7 |
Annotation Scheme and Gold Standard for Dutch Subjective Adjectives
|
|
|
|
In: http://www.lrec-conf.org/proceedings/lrec2010/pdf/165_Paper.pdf (2010)
|
|
BASE
|
|
Show details
|
|
8 |
Computational Linguistics Proceedings of the 6th Workshop on Ontologies and Lexical Resources
|
|
|
|
In: http://aclweb.org/anthology-new/W/W10/W10-33.pdf (2010)
|
|
BASE
|
|
Show details
|
|
9 |
Bootstrapping languageneutral term extraction
|
|
|
|
In: http://lexitron.nectec.or.th/public/LREC-2010_Malta/pdf/902_Paper.pdf (2010)
|
|
BASE
|
|
Show details
|
|
10 |
SemEval-2010 Task 17: All-words Word Sense . . .
|
|
|
|
In: http://aclweb.org/anthology-new/W/W09/W09-2420.pdf (2009)
|
|
BASE
|
|
Show details
|
|
11 |
Kaf: a generic semantic annotation format
|
|
|
|
In: http://adimen.si.ehu.es/~rigau/publications/gl09-kaf.pdf (2009)
|
|
BASE
|
|
Show details
|
|
12 |
Integrating lexical units, synsets and ontology in the cornetto database
|
|
|
|
In: http://www.lrec-conf.org/proceedings/lrec2008/pdf/255_paper.pdf (2008)
|
|
BASE
|
|
Show details
|
|
13 |
Adjectives in the dutch semantic lexical database cornetto
|
|
|
|
In: http://www.lrec-conf.org/proceedings/lrec2008/pdf/184_paper.pdf (2008)
|
|
BASE
|
|
Show details
|
|
14 |
Adjectives in the dutch semantic lexical database cornetto
|
|
|
|
In: http://taalunieversum.org/archief/taal/technologie/stevin/documenten/cornetto_lrec2008_adjectives.pdf (2008)
|
|
BASE
|
|
Show details
|
|
15 |
KYOTO: A system for mining, structuring and distributing knowledge across languages and cultures
|
|
|
|
In: http://adimen.si.ehu.es/~rigau/publications/lrec08-vac.pdf (2008)
|
|
BASE
|
|
Show details
|
|
16 |
KYOTO: A system for mining, structuring and distributing knowledge across languages and cultures
|
|
|
|
In: http://cwn.ling.sinica.edu.tw/churen/2008kyoto.gwa.pdf (2008)
|
|
BASE
|
|
Show details
|
|
17 |
Kyoto: a system for mining, structuring and distributing knowledge across languages and cultures
|
|
|
|
In: http://www.vossen.info/docs/2008/KYOTO_LREC2008[6].pdf (2008)
|
|
BASE
|
|
Show details
|
|
18 |
Integrating lexical units, synsets and ontology in the Cornetto database
|
|
|
|
In: http://www.vossen.info/docs/2008/Cornetto_LREC2008[4].pdf (2008)
|
|
BASE
|
|
Show details
|
|
19 |
A Distributed Database System for Developing Ontological and Lexical Resources in Harmony
|
|
|
|
In: http://www.vossen.info/docs/2008/cicling2008_hales_vossen_xrambous.pdf (2008)
|
|
BASE
|
|
Show details
|
|
20 |
Connecting the Universal to the Specific: Towards the Global Grid
|
|
|
|
In: http://www.vossen.info/docs/2007/IWIC 2007.pdf (2007)
|
|
BASE
|
|
Show details
|
|
|
|