Page: 1... 3 4 5 6 7 8 9 10 11... 1.645
121 |
Bilinguals have a single computational system but two compartmentalized phonological grammars: Evidence from code-switching
|
|
|
|
BASE
|
|
Show details
|
|
122 |
When Language Contact Says Nothing: A Contrastive Analysis of Queísta Structures in Two Varieties of Peninsular Spanish
|
|
|
|
BASE
|
|
Show details
|
|
123 |
CorpusExplorer ; Eine Software zur korpuspragmatischen Analyse
|
|
|
|
BASE
|
|
Show details
|
|
124 |
Universal Dependencies and Semantics for English and Hebrew Child-directed Speech
|
|
|
|
In: Proceedings of the Society for Computation in Linguistics (2022)
|
|
BASE
|
|
Show details
|
|
125 |
The negotiation of authorial persona in dissertations literature review and discussion sections
|
|
|
|
In: Russian Journal of Linguistics, Vol 26, Iss 1, Pp 51-73 (2022) (2022)
|
|
BASE
|
|
Show details
|
|
126 |
A comparative corpus stylistic analysis of thematization and characterization in Gordimer’s My Son’s Story and Coetzee’s Disgrace
|
|
|
|
In: Open Linguistics, Vol 8, Iss 1, Pp 46-64 (2022) (2022)
|
|
BASE
|
|
Show details
|
|
127 |
Prices are rising, wages are falling: Argument structure of verbs denoting ‘increase’ and ‘decrease’ in the Russian language
|
|
|
|
In: Russian Journal of Linguistics, Vol 26, Iss 1, Pp 194-223 (2022) (2022)
|
|
BASE
|
|
Show details
|
|
128 |
A very unpredictable ‘person’: A corpus-based approach to suppletion in West Polesian
|
|
|
|
In: Russian Journal of Linguistics, Vol 26, Iss 1, Pp 116-138 (2022) (2022)
|
|
BASE
|
|
Show details
|
|
129 |
For a Better Dictionary: Revisiting Ecolexicography as a New Paradigm
|
|
|
|
In: Lexikos, Vol 31, Pp 281-321 (2022) (2022)
|
|
BASE
|
|
Show details
|
|
130 |
Multi-word units (and tokenization more generally): a multi-dimensional and largely information-theoretic approach
|
|
|
|
In: Lexis: Journal in English Lexicology, Vol 19 (2022) (2022)
|
|
Abstract:
It has been argued that most of corpus linguistics involves one of four fundamental methods: frequency lists, dispersion, collocation, and concordancing. All these presuppose (if only implicitly) the definition of a unit: the element whose frequency in a corpus, in corpus parts, or around a search word are counted (or quantified in other ways). Usually and with most corpus-processing tools, a unit is an orthographic word. However, it is obvious that this is a simplifying assumption borne out of convenience: clearly, it seems more intuitive to consider because of or in spite of as one unit each rather than two or three. Some work in computational linguistics has developed multi-word unit (MWU) identification algorithms, which typically involve co-occurrence token frequencies and association measures (AMs), but these have not become widespread in corpus-linguistic practice despite the fact that recognizing MWUs like the above will have a profound impact on just about all corpus statistics that involve (simplistic notions of) words/units. In this programmatic proof-of-concept paper, I introduce and exemplify an algorithm to identify MWUs that goes beyond frequency and bidirectional association by also involving several well-known but underutilized dimensions of corpus-linguistic information: frequency: how often does a potential unit (like in_spite_of) occur?; dispersion: how widespread is the use of a potential unit?; association: how strongly attracted are the parts of a potential unit?; entropy: how variable is each slot in a potential unit? The proposed algorithm can use all these dimensions and weight them differently. I will (i) present the algorithm in detail, (ii) exemplify its application to the Brown corpus, (iii) discuss its results on the basis of several kinds of MWUs it returns, and (iv) discuss next analytical steps.
|
|
Keyword:
association; corpus linguistics; dispersion; frequency; Lexicography; multi-word units; n-grams; P327-327.5
|
|
URL: https://doaj.org/article/a4ad39e66ddd472b9538708836e706f1 https://doi.org/10.4000/lexis.6231
|
|
BASE
|
|
Hide details
|
|
131 |
O sufixo -AZO em unidades léxico-fraseológicas: uma análise contrastiva espanhol/português em corpus jornalístico
|
|
|
|
In: Revista Nebrija de Linguistica Aplicada a la Enseñanza de Lenguas, Vol 16, Iss 32 (2022) (2022)
|
|
BASE
|
|
Show details
|
|
132 |
O sufixo -AZO em unidades léxico-fraseológicas: uma análise contrastiva espanhol/português em corpus jornalístico
|
|
|
|
In: Revista Nebrija de Linguistica Aplicada a la Enseñanza de Lenguas, Vol 16, Iss 32 (2022) (2022)
|
|
BASE
|
|
Show details
|
|
133 |
A GENRE AND COLLOCATIONAL ANALYSIS OF THE NEAR-SYNONYMS TEACH, EDUCATE AND INSTRUCT: A CORPUS-BASED APPROACH
|
|
|
|
In: TEFLIN Journal, Vol 33, Iss 1, Pp 75-97 (2022) (2022)
|
|
BASE
|
|
Show details
|
|
134 |
Corpus Linguistics approaches to trainee translators’ framing practice in news translation
|
|
|
|
In: Translation and Interpreting : the International Journal of Translation and Interpreting Research, Vol 12 , Iss 1 (2022) (2022)
|
|
BASE
|
|
Show details
|
|
135 |
TRANSLATION OF KOREAN-INDONESIAN SHORT STORIES: AN ANALYSIS OF CLASS AND SEMANTIC SHIFTS OF ADVERBS OF MODALITY
|
|
|
|
In: LiNGUA: Jurnal Ilmu Bahasa dan Sastra; Vol 16, No 2 (2021): LiNGUA; 271 - 282 ; 2442-3823 ; 1693-4725 (2022)
|
|
BASE
|
|
Show details
|
|
136 |
Phonologically motivated orthographic variation in Modern Uyghur: the voicing of h
|
|
|
|
In: Proceedings of the Workshop on Turkic and Languages in Contact with Turkic; Vol 6 (2021); 5049 ; 2641-3485 (2022)
|
|
BASE
|
|
Show details
|
|
137 |
Efficient marking of argument focus: A trade-off between focus particles and word order in Sinhala
|
|
|
|
In: Proceedings of the Linguistic Society of America; Vol 7, No 1 (2022): Proceedings of the Linguistic Society of America; 5223 ; 2473-8689 (2022)
|
|
BASE
|
|
Show details
|
|
Page: 1... 3 4 5 6 7 8 9 10 11... 1.645
|
|