DE eng

Search in the Catalogues and Directories

Hits 1 – 5 of 5

1
Homepage2Vec: Language-Agnostic Website Embedding and Classification ...
Abstract: Currently, publicly available models for website classification do not offer an embedding method and have limited support for languages beyond English. We release a dataset of more than two million category-labeled websites in 92 languages collected from Curlie, the largest multilingual human-edited Web directory. The dataset contains 14 website categories aligned across languages. Alongside it, we introduce Homepage2Vec, a machine-learned pre-trained model for classifying and embedding websites based on their homepage in a language-agnostic way. Homepage2Vec, thanks to its feature set (textual content, metadata tags, and visual attributes) and recent progress in natural language representation, is language-independent by design and generates embedding-based representations. We show that Homepage2Vec correctly classifies websites with a macro-averaged F1-score of 0.90, with stable performance across low- as well as high-resource languages. Feature analysis shows that a small subset of efficiently computable ... : Published in Proc. of ICWSM 2022 ...
Keyword: Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences
URL: https://dx.doi.org/10.48550/arxiv.2201.03677
https://arxiv.org/abs/2201.03677
BASE
Hide details
2
Curlie Dataset - Language-agnostic Website Embedding and Classification ...
Lugeon, Sylvain; Piccardi, Tiziano. - : figshare, 2022
BASE
Show details
3
Curlie Dataset - Language-agnostic Website Embedding and Classification ...
Lugeon, Sylvain; Piccardi, Tiziano. - : figshare, 2022
BASE
Show details
4
Curlie Dataset - Language-agnostic Website Embedding and Classification ...
Lugeon, Sylvain; Piccardi, Tiziano. - : figshare, 2022
BASE
Show details
5
Curlie Dataset - Language-agnostic Website Embedding and Classification ...
Lugeon, Sylvain; Piccardi, Tiziano. - : figshare, 2022
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
5
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern