Page: 1 2 3 4 5 6 7... 734
41 |
Communication and motor abilities in CAS (Iuzzini-Seigel et al., 2022) ...
|
|
|
|
BASE
|
|
Show details
|
|
44 |
The Evolutionary Pattern of Language in English Fiction Over the Last Two Centuries: Insights From Linguistic Concreteness and Imageability ...
|
|
|
|
BASE
|
|
Show details
|
|
45 |
Linguistic justice as a framework for designing, developing, and managing natural language processing tools ...
|
|
|
|
BASE
|
|
Show details
|
|
46 |
sj-pdf-1-bds-10.1177_20539517221090930 - Supplemental material for Linguistic justice as a framework for designing, developing, and managing natural language processing tools ...
|
|
|
|
BASE
|
|
Show details
|
|
47 |
The Evolutionary Pattern of Language in English Fiction Over the Last Two Centuries: Insights From Linguistic Concreteness and Imageability ...
|
|
|
|
BASE
|
|
Show details
|
|
48 |
sj-pdf-1-bds-10.1177_20539517221090930 - Supplemental material for Linguistic justice as a framework for designing, developing, and managing natural language processing tools ...
|
|
|
|
BASE
|
|
Show details
|
|
49 |
Linguistic justice as a framework for designing, developing, and managing natural language processing tools ...
|
|
|
|
BASE
|
|
Show details
|
|
50 |
Positions on science and religious beliefs across societies: Development of a research instrument and testing of its validity among high school students ...
|
|
|
|
BASE
|
|
Show details
|
|
51 |
Positions on science and religious beliefs across societies: Development of a research instrument and testing of its validity among high school students ...
|
|
|
|
BASE
|
|
Show details
|
|
53 |
Curlie Dataset - Language-agnostic Website Embedding and Classification ...
|
|
|
|
Abstract:
**************** Full Curlie dataset **************** This dataset contains the URL scrapped from curlie.org alongside with their multilingual labels. The label correspond to the sub-category where the URL was referenced in Curlie. We also provide a mapping between english labels and labels from other languages for alignment. The URLs have been filtered to only contain homepages. Each distint URL is indexed with a unique identifier (uid). curlie.csv.gz > [url, uid, label, lang] x 2,275,150 samples mapping.json.gz > [english_label, matchings] x 35,946 labels **************** Processed Curlie dataset **************** You find here the data used to train Homepage2vec. URLs have been further filtered out: websites listed under the Regional top-category where dropped, as well as non-accessible websites. This filtering yields 1,018,207 valid URL. The labels are aligned across languages and reduced to the 14 top-categories (classes). Because a URL can belong to several classes, a binary vector is used. The ...
|
|
Keyword:
170203 Knowledge Representation and Machine Learning; 80505 Web Technologies excl. Web Search; 80704 Information Retrieval and Web Search; Applied Computer Science; FOS Computer and information sciences; FOS Media and communications; FOS Psychology
|
|
URL: https://dx.doi.org/10.6084/m9.figshare.19406693.v1 https://figshare.com/articles/dataset/Curlie_Dataset_-_Language-agnostic_Website_Embedding_and_Classification/19406693/1
|
|
BASE
|
|
Hide details
|
|
54 |
Common methodological framework and best practices in validation across Europe ...
|
|
|
|
BASE
|
|
Show details
|
|
56 |
Curlie Dataset - Language-agnostic Website Embedding and Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
57 |
Curlie Dataset - Language-agnostic Website Embedding and Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
58 |
Curlie Dataset - Language-agnostic Website Embedding and Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
59 |
Common methodological framework and best practices in validation across Europe ...
|
|
|
|
BASE
|
|
Show details
|
|
60 |
A discourse study of selected newspaper headlines on insurgency in Nigeria ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7... 734
|
|