Home
Catalogue search
Refine your search:
Keyword:
Audio and Speech Processing eess.AS (2)
FOS Electrical engineering, electronic engineering, information engineering (2)
Computation and Language cs.CL (1)
FOS Computer and information sciences (1)
Sound cs.SD (1)
Creator / Publisher:
Khong, Andy W. H. (2)
Khudanpur, Sanjeev (2)
Liu, Hexin (2)
Perera, Leibny Paola Garcia (2)
Styles, Suzy J. (2)
Dauwels, Justin (1)
Year
Medium
Type
BLLDB-Access:
free (2)
subject to license (0)
Search in the Catalogues and Directories
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
Sort by
creator [A → Z]
'
creator [Z → A]
'
publishing year ↑ (asc)
'
publishing year ↓ (desc)
'
title [A → Z]
'
title [Z → A]
'
Simple Search
Hits 1 – 2 of 2
1
PHO-LID: A Unified Model Incorporating Acoustic-Phonetic and Phonotactic Information for Language Identification ...
Liu, Hexin
;
Perera, Leibny Paola Garcia
;
Khong, Andy W. H.
;
Styles, Suzy J.
;
Khudanpur, Sanjeev
. - : arXiv, 2022
Abstract:
We propose a novel model to hierarchically incorporate phoneme and phonotactic information for language identification (LID) without requiring phoneme annotations for training. In this model, named PHO-LID, a self-supervised phoneme segmentation task and a LID task share a convolutional neural network (CNN) module, which encodes both language identity and sequential phonemic information in the input speech to generate an intermediate sequence of phonotactic embeddings. These embeddings are then fed into transformer encoder layers for utterance-level LID. We call this architecture CNN-Trans. We evaluate it on AP17-OLR data and the MLS14 set of NIST LRE 2017, and show that the PHO-LID model with multi-task optimization exhibits the highest LID performance among all models, achieving over 40% relative improvement in terms of average cost on AP17-OLR data compared to a CNN-Trans model optimized only for LID. The visualized confusion matrices imply that our proposed method achieves higher performance on languages ... : Submitted to Interspeech 2022, updated to the submitted version ...
Keyword:
Audio and Speech Processing eess.AS
;
FOS Electrical engineering, electronic engineering, information engineering
URL:
https://arxiv.org/abs/2203.12366
https://dx.doi.org/10.48550/arxiv.2203.12366
BASE
Hide details
2
Enhance Language Identification using Dual-mode Model with Knowledge Distillation ...
Liu, Hexin
;
Perera, Leibny Paola Garcia
;
Khong, Andy W. H.
. - : arXiv, 2022
BASE
Show details
Mobile view
All
Catalogues
UB Frankfurt Linguistik
0
IDS Mannheim
0
OLC Linguistik
0
UB Frankfurt Retrokatalog
0
DNB Subject Category Language
0
Institut für Empirische Sprachwissenschaft
0
Leibniz-Centre General Linguistics (ZAS)
0
Bibliographies
BLLDB
0
BDSL
0
IDS Bibliografie zur deutschen Grammatik
0
IDS Bibliografie zur Gesprächsforschung
0
IDS Konnektoren im Deutschen
0
IDS Präpositionen im Deutschen
0
IDS OBELEX meta
0
MPI-SHH Linguistics Collection
0
MPI for Psycholinguistics
0
Linked Open Data catalogues
Annohub
0
Online resources
Link directory
0
Journal directory
0
Database directory
0
Dictionary directory
0
Open access documents
BASE
2
Linguistik-Repository
0
IDS Publikationsserver
0
Online dissertations
0
Language Description Heritage
0
© 2013 - 2024 Lin|gu|is|tik
|
Imprint
|
Privacy Policy
|
Datenschutzeinstellungen ändern