1 |
Exploring Linguistic Constraints in Nlp Applications
|
|
|
|
In: Publicly Accessible Penn Dissertations (2014)
|
|
Abstract:
The key argument of this dissertation is that the success of an Natural Language Processing (NLP) application depends on a proper representation of the corresponding linguistic problem. This theme is raised in the context that the recent progress made in our field is widely credited to the effective use of strong engineering techniques. However, the intriguing power of highly lexicalized models shown in many NLP applications is not only an achievement by the development in machine learning, but also impossible without the extensive hand-annotated data resources made available, which are originally built with very deep linguistic considerations. More specifically, we explore three linguistic aspects in this dissertation: the distinction between closed-class vs. open-class words, long-tail distributions in vocabulary study and determinism in language models. The first two aspects are studied in unsupervised tasks, unsupervised part-of-speech (POS) tagging and morphology learning, and the last one is studied in supervised tasks, English POS tagging and Chinese word segmentation. Each linguistic aspect under study manifests itself in a (different) way to help improve performance or efficiency in some NLP application.
|
|
Keyword:
Chinese word segmentation; closed-class words; Computer Sciences; long-tail distribution; Morphology learning; natural language processing; Unsupervised POS tagging
|
|
URL: https://repository.upenn.edu/cgi/viewcontent.cgi?article=3335&context=edissertations https://repository.upenn.edu/edissertations/1523
|
|
BASE
|
|
Hide details
|
|
2 |
A Cognitive Model of Chinese Word Segmentation for Machine Translation
|
|
Wu, Zhijie. - : Les Presses de l’Université de Montréal, 2011. : Érudit, 2011
|
|
BASE
|
|
Show details
|
|
3 |
New Light Shed on Chinese Word Segmentation in MT by a Language Investigation
|
|
Wu, Zhijie. - : Les Presses de l'Université de Montréal, 2008. : Érudit, 2008
|
|
BASE
|
|
Show details
|
|
4 |
Improving Chinese word segmentation with description length gain
|
|
|
|
In: http://personal.cityu.edu.hk/~ctckit/papers/ICA7356.pdf (2007)
|
|
BASE
|
|
Show details
|
|
5 |
Two-pass named entity classification for cross language question answering
|
|
|
|
In: http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings6/NTCIR/18.pdf (2007)
|
|
BASE
|
|
Show details
|
|
6 |
Submitted: Accepted: Published:
|
|
|
|
In: http://s2is.org/Issues/v7/n1/papers/paper14.pdf
|
|
BASE
|
|
Show details
|
|
7 |
© The Association for Computational Linguistics and Chinese Language Processing Chinese Word Segmentation as Character Tagging
|
|
|
|
In: http://www.aclclp.org.tw/clclp/v8n1/v8n1a2.pdf
|
|
BASE
|
|
Show details
|
|
8 |
Ó The Association for Computational Linguistics and Chinese Language Processing
|
|
|
|
In: http://verbs.colorado.edu/~xuen/publications/clclp03.xue.pdf
|
|
BASE
|
|
Show details
|
|
9 |
Storage and Retrieval]: Content Analysis and Indexing – Linguistic processing.
|
|
|
|
In: http://www2007.org/posters/poster923.pdf
|
|
BASE
|
|
Show details
|
|
10 |
7Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation
|
|
In: http://www.nlpr.ia.ac.cn/cip/ZongPublications/2012/2012.06+ACM+TALIP+K.Wang.pdf
|
|
BASE
|
|
Show details
|
|
|
|