5 |
Indian Language Part-of-Speech Tagset: Hindi ...
|
|
|
|
Abstract:
Introduction Indian Language Part-of-Speech Tagset: Hindi, Linguistic Data Consortium (LDC) catalog number LDC2010T24 and isbn 1-58563-571-5, is a corpus developed by Microsoft Research (MSR) India to support the task of Part-of-Speech Tagging (POS) and other data-driven linguistic research on Indian Languages in general. It is created as a part of the Indian Language Part-of-Speech Tagset (IL-POST) project, a collaborative effort among linguists and computer scientists from MSR India, AU-KBC (Anna University, Chennai), Delhi University, IIT Bombay, Jawaharlal Nehru University (Delhi) and Tamil University (Tamilnadu). The goal of the IL-POST project is to provide a common tagset framework for Indian Languages that offers flexibility, cross-linguistic compatibility and reusability across those languages. It supports a three-level hierarchy of Categories, Types and Attributes. The corpus ...
|
|
URL: https://catalog.ldc.upenn.edu/LDC2010T24 https://dx.doi.org/10.35111/bpb8-ew63
|
|
BASE
|
|
Hide details
|
|
|
|