3 |
A Hybrid Model for Enhancing Lexical Statistical Machine Translation (SMT) ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Typesetting for Improved Readability using Lexical and Syntactic Information ...
|
|
|
|
Abstract:
We present results from our study of which uses syntactically and semantically motivated information to group segments of sentences into unbreakable units for the purpose of typesetting those sentences in a region of a fixed width, using an otherwise standard dynamic programming line breaking algorithm, to minimize raggedness. In addition to a rule-based baseline segmenter, we use a very modest size text, manually annotated with positions of breaks, to train a maximum entropy classifier, relying on an extensive set of lexical and syntactic features, which can then predict whether or not to break after a certain word position in a sentence. We also use a simple genetic algorithm to search for a subset of the features optimizing F1, to arrive at a set of features that delivers 89.2% Precision, 90.2% Recall (89.7% F1) on a test set, improving the rule-based baseline by about 11 points and the classifier trained on all features by about 1 point in F1. ...
|
|
Keyword:
200303 English as a Second Language; 200402 Computational Linguistics; 80107 Natural Language Processing; FOS Computer and information sciences; FOS Languages and literature
|
|
URL: https://dx.doi.org/10.1184/r1/6368081 https://kilthub.cmu.edu/articles/Typesetting_for_Improved_Readability_using_Lexical_and_Syntactic_Information/6368081
|
|
BASE
|
|
Hide details
|
|
5 |
Typesetting for Improved Readability using Lexical and Syntactic Information ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|