1 |
The persistence and evolutionary consequences of vestigial behaviours
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Next-gen sequencing identifies non-coding variation disrupting miRNA-binding sites in neurological disorders
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Hyperkinetic stereotyped movements in a boy with biallelic CNTNAP2 variants
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Genomic imprinting as a window into human language evolution
|
|
|
|
BASE
|
|
Show details
|
|
5 |
The effects of genetic ancestry on elite sprint athlete status in the West African diaspora
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Copy number variation screen identifies a rare de novo deletion at chromosome 15q13.1-13.3 in a child with language impairment
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Genome-wide screening for DNA variants associated with reading and language traits
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Genetic analysis of dyslexia candidate genes in the European cross-linguistic NeuroDys cohort
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Genome-wide association analyses of child genotype effects and parent-of-origin effects in specific language impairment
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Mosaic maternal ancestry in the Great Lakes region of East Africa
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Differential allelic expression of SOS1 and hyperexpression of the activating SOS1 c.755C variant in a Noonan syndrome family
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Reading and language disorders : the importance of both quantity and quality
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Counselling uncertainty: genetics professionals' accounts of (non)directiveness and trust/distrust
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Counselling uncertainty: genetics professionals' accounts of (non)directiveness and trust/distrust
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Uniparental Genetic Heritage of Belarusians: Encounter of Rare Middle Eastern Matrilineages with a Central European Mitochondrial DNA Pool
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Statistical issues in modelling the ancestry from Y-chromosome and surname data
|
|
|
|
BASE
|
|
Show details
|
|
19 |
A genetic-based HAC technique for parallel clustering of bilingual Malay-English corpora
|
|
|
|
Abstract:
Multi Multilingual corpora, containing the same documents in a variety of languages, are becoming an essential resource for natural language processing. Clustering multilingual corpora provides us with an insight into the differences between languages when term frequency-based Information Retrieval (IR) tools are used. It also allows one to use the Natural Language Processing (NLP) and IR tools in one language to implement IR for another language. For instance, in this way, the most relevant articles to be translated from language Malay to language English can be selected after studying the clusters of abstracts in language English. In this paper, we report on our work on applying Hierarchical Agglomerative Clustering (HAC) to a large corpus of documents where each appears both in Malay and English. We cluster these documents for each language and compare the results both with respect to the content of clusters produced. On the data available, the results of clustering one language resemble the other, provided the number of clusters required is relatively small. Further, we study the effects of changing the method used to compute the inter-clusters distance that includes single link, complete link and average link distance between clusters. Finally, we describe an experiment employing a genetic algorithm to fine-tune the individual term weights in order to reproduce more closely a predefined set of clusters. In this way, clustering becomes a supervised learning technique that is trained to better reproduce known clusters in language Malay when applied to the corresponding documents in language English. Other possible applications include training the algorithm on a hand clustered set of documents, and subsequently applying it to a superset, including unseen documents, incorporating in this way expert knowledge about the domain in the clustering algorithm.
|
|
Keyword:
QA76.75-76.765 Computer software; QH426-470 Genetics
|
|
URL: http://eprints.ums.edu.my/id/eprint/29029/3/A%20Genetic-Based%20HAC%20Technique%20for%20Parallel%20Clustering%20of%20Bilingual%20Malay-English%20Corpora%20FULL%20TEXT.pdf https://www.seekdl.org/assets/pdf/20121212_101902.pdf http://eprints.ums.edu.my/id/eprint/29029/ http://eprints.ums.edu.my/id/eprint/29029/2/A%20genetic-based%20HAC%20technique%20for%20parallel%20clustering%20of%20bilingual%20Malay-English%20corpora_ABSTRACT.pdf
|
|
BASE
|
|
Hide details
|
|
20 |
The spatial and temporal dimensions of reflective questions in genetic counselling
|
|
|
|
BASE
|
|
Show details
|
|
|
|