Catalogue search • Linguistik portal • Fachinformationsdienst (FID)

1	Temporally-Informed Analysis of Named Entity Recognition ...
	Rijhwani, Shruti; Preoțiuc-Pietro, Daniel. - : Zenodo, 2020
	BASE
	Show details

2	Temporally-Informed Analysis of Named Entity Recognition ...
	Rijhwani, Shruti; Preoțiuc-Pietro, Daniel. - : Zenodo, 2020
	Abstract: This repository contains the data set developed for the paper: “Shruti Rijhwani and Daniel Preoțiuc-Pietro. Temporally-Informed Analysis of Named Entity Recognition. In Proceedings of the Association for Computational Linguistics (ACL). 2020.” It includes 12,000 tweets annotated for the named entity recognition task. The tweets are uniformly distributed over the years 2014-2019, with 2,000 tweets from each year. The goal is to have a temporally diverse corpus to account for data drift over time when building NER models. The entity types annotated are locations (LOC), persons (PER) and organizations (ORG). The tweets are preprocessed to replace usernames and URLs with a unique token. Hashtags are left intact and can be annotated as named entities. Format The repository contains the annotations in JSON format. Each year-wise file has the tweet IDs along with token-level annotations. The Public Twitter Search API (https://developer.twitter.com/en/docs/tweets/search) can be used extract the text for the tweet ...
	Keyword: information extraction; named entity recognition; ner; temporal analysis; tweets; twitter; twitter ner
	URL: https://zenodo.org/record/3899040 https://dx.doi.org/10.5281/zenodo.3899040
	BASE
	Hide details

Search in the Catalogues and Directories