Home
Catalogue search
Refine your search:
Keyword:
80704 Information Retrieval and Web Search (13)
FOS Media and communications (13)
FOS Computer and information sciences (8)
170203 Knowledge Representation and Machine Learning (4)
80105 Expert Systems (4)
80505 Web Technologies excl. Web Search (4)
Applied Computer Science (4)
FOS Psychology (4)
200308 Iberian Languages (3)
200405 Language in Culture and Society Sociolinguistics (3)
more
Creator / Publisher:
Lugeon, Sylvain (4)
Martinez, Jorge (4)
Piccardi, Tiziano (4)
Serrano-Cobos, Jorge (3)
Lewandowski, Dirk (2)
Mayr, Philipp (2)
Schaer, Philipp (2)
Sünkler, Sebastian (2)
Year:
2022 (4)
2021 (3)
2018 (2)
2017 (2)
2016 (2)
Medium
Type:
Miscellaneous (13)
BLLDB-Access:
free (13)
subject to license (0)
Search in the Catalogues and Directories
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
Sort by
creator [A → Z]
'
creator [Z → A]
'
publishing year ↑ (asc)
'
publishing year ↓ (desc)
'
title [A → Z]
'
title [Z → A]
'
Simple Search
Hits 1 – 13 of 13
1
Curlie Dataset - Language-agnostic Website Embedding and Classification ...
Lugeon, Sylvain
;
Piccardi, Tiziano
. - : figshare, 2022
BASE
Show details
2
Curlie Dataset - Language-agnostic Website Embedding and Classification ...
Lugeon, Sylvain
;
Piccardi, Tiziano
. - : figshare, 2022
BASE
Show details
3
Curlie Dataset - Language-agnostic Website Embedding and Classification ...
Lugeon, Sylvain
;
Piccardi, Tiziano
. - : figshare, 2022
BASE
Show details
4
Curlie Dataset - Language-agnostic Website Embedding and Classification ...
Lugeon, Sylvain
;
Piccardi, Tiziano
. - : figshare, 2022
Abstract:
**************** Full Curlie dataset **************** This dataset contains the URL scrapped from curlie.org alongside with their multilingual labels. The label correspond to the sub-category where the URL was referenced in Curlie. We also provide a mapping between english labels and labels from other languages for alignment. The URLs have been filtered to only contain homepages. Each distint URL is indexed with a unique identifier (uid). curlie.csv.gz > [url, uid, label, lang] x 2,275,150 samples mapping.json.gz > [english_label, matchings] x 35,946 labels **************** Processed Curlie dataset **************** You find here the data used to train Homepage2vec. URLs have been further filtered out: websites listed under the Regional top-category where dropped, as well as non-accessible websites. This filtering yields 1,018,207 valid URL. The labels are aligned across languages and reduced to the 14 top-categories (classes). Because a URL can belong to several classes, a binary vector is used. The ...
Keyword:
170203 Knowledge Representation and Machine Learning
;
80505 Web Technologies excl. Web Search
;
80704 Information Retrieval and Web Search
;
Applied Computer Science
;
FOS Computer and information sciences
;
FOS Media and communications
;
FOS Psychology
URL:
https://figshare.com/articles/dataset/Curlie_Dataset_-_Language-agnostic_Website_Embedding_and_Classification/19406693/3
https://dx.doi.org/10.6084/m9.figshare.19406693.v3
BASE
Hide details
5
Keywords Queries Palabras Clave búsquedas en Google sobre libro y lectura en España 2004-2016 - tesis doctoral jorge serrano-cobos ...
Serrano-Cobos, Jorge
. - : figshare, 2021
BASE
Show details
6
Keywords Queries Palabras Clave búsquedas en Google sobre libro y lectura en España 2004-2016 - tesis doctoral jorge serrano-cobos ...
Serrano-Cobos, Jorge
. - : figshare, 2021
BASE
Show details
7
Keywords Queries Palabras Clave búsquedas en Google sobre libro y lectura en España 2004-2016 - tesis doctoral jorge serrano-cobos ...
Serrano-Cobos, Jorge
. - : figshare, 2021
BASE
Show details
8
CoTO: A Novel Approach for Fuzzy Aggregation of Semantic Similarity Measures ...
Martinez, Jorge
. - : figshare, 2018
BASE
Show details
9
CoTO: A Novel Approach for Fuzzy Aggregation of Semantic Similarity Measures ...
Martinez, Jorge
. - : figshare, 2018
BASE
Show details
10
Semantic Similarity Measurement Using Historical Google Search Patterns ...
Martinez, Jorge
. - : figshare, 2017
BASE
Show details
11
Semantic Similarity Measurement Using Historical Google Search Patterns ...
Martinez, Jorge
. - : figshare, 2017
BASE
Show details
12
How Relevant is the Long Tail? - A Relevance Assessment Study on Million Short (Best Poster Award CLEF 2016) ...
Schaer, Philipp
;
Mayr, Philipp
;
Sünkler, Sebastian
. - : figshare, 2016
BASE
Show details
13
How Relevant is the Long Tail? - A Relevance Assessment Study on Million Short (Best Poster Award CLEF 2016) ...
Schaer, Philipp
;
Mayr, Philipp
;
Sünkler, Sebastian
. - : figshare, 2016
BASE
Show details
Mobile view
All
Catalogues
UB Frankfurt Linguistik
0
IDS Mannheim
0
OLC Linguistik
0
UB Frankfurt Retrokatalog
0
DNB Subject Category Language
0
Institut für Empirische Sprachwissenschaft
0
Leibniz-Centre General Linguistics (ZAS)
0
Bibliographies
BLLDB
0
BDSL
0
IDS Bibliografie zur deutschen Grammatik
0
IDS Bibliografie zur Gesprächsforschung
0
IDS Konnektoren im Deutschen
0
IDS Präpositionen im Deutschen
0
IDS OBELEX meta
0
MPI-SHH Linguistics Collection
0
MPI for Psycholinguistics
0
Linked Open Data catalogues
Annohub
0
Online resources
Link directory
0
Journal directory
0
Database directory
0
Dictionary directory
0
Open access documents
BASE
13
Linguistik-Repository
0
IDS Publikationsserver
0
Online dissertations
0
Language Description Heritage
0
© 2013 - 2024 Lin|gu|is|tik
|
Imprint
|
Privacy Policy
|
Datenschutzeinstellungen ändern