Home
Catalogue search
Refine your search:
Keyword:
Audio and Speech Processing eess.AS (4)
Computation and Language cs.CL (4)
FOS Computer and information sciences (4)
FOS Electrical engineering, electronic engineering, information engineering (4)
Sound cs.SD (4)
Machine Learning cs.LG (1)
Creator / Publisher:
Chen, Yi-Chen (4)
Lee, Hung-yi (4)
Huang, Sung-Feng (3)
Lee, Lin-shan (3)
Shen, Chia-Hao (2)
Hsu, Jui-Yang (1)
Lee, Cheng-Kuang (1)
Year
Medium
Type:
Miscellaneous (4)
BLLDB-Access
Search in the Catalogues and Directories
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
Sort by
creator [A → Z]
'
creator [Z → A]
'
publishing year ↑ (asc)
'
publishing year ↓ (desc)
'
title [A → Z]
'
title [Z → A]
'
Simple Search
Hits 1 – 4 of 4
1
DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation ...
Chen, Yi-Chen
;
Hsu, Jui-Yang
;
Lee, Cheng-Kuang
. - : arXiv, 2020
BASE
Show details
2
From Semi-supervised to Almost-unsupervised Speech Recognition with Very-low Resource by Jointly Learning Phonetic Structures from Audio and Text Embeddings ...
Chen, Yi-Chen
;
Huang, Sung-Feng
;
Lee, Hung-yi
;
Lee, Lin-shan
. - : arXiv, 2019
Abstract:
Producing a large amount of annotated speech data for training ASR systems remains difficult for more than 95% of languages all over the world which are low-resourced. However, we note human babies start to learn the language by the sounds (or phonetic structures) of a small number of exemplar words, and "generalize" such knowledge to other words without hearing a large amount of data. We initiate some preliminary work in this direction. Audio Word2Vec is used to learn the phonetic structures from spoken words (signal segments), while another autoencoder is used to learn the phonetic structures from text words. The relationships among the above two can be learned jointly, or separately after the above two are well trained. This relationship can be used in speech recognition with very low resource. In the initial experiments on the TIMIT dataset, only 2.1 hours of speech data (in which 2500 spoken words were annotated and the rest unlabeled) gave a word error rate of 44.6%, and this number can be reduced to ...
Keyword:
Audio and Speech Processing eess.AS
;
Computation and Language cs.CL
;
FOS Computer and information sciences
;
FOS Electrical engineering, electronic engineering, information engineering
;
Sound cs.SD
URL:
https://arxiv.org/abs/1904.05078
https://dx.doi.org/10.48550/arxiv.1904.05078
BASE
Hide details
3
Almost-unsupervised Speech Recognition with Close-to-zero Resource Based on Phonetic Structures Learned from Very Small Unpaired Speech and Text Data ...
Chen, Yi-Chen
;
Shen, Chia-Hao
;
Huang, Sung-Feng
. - : arXiv, 2018
BASE
Show details
4
Phonetic-and-Semantic Embedding of Spoken Words with Applications in Spoken Content Retrieval ...
Chen, Yi-Chen
;
Huang, Sung-Feng
;
Shen, Chia-Hao
. - : arXiv, 2018
BASE
Show details
Mobile view
All
Catalogues
UB Frankfurt Linguistik
0
IDS Mannheim
0
OLC Linguistik
0
UB Frankfurt Retrokatalog
0
DNB Subject Category Language
0
Institut für Empirische Sprachwissenschaft
0
Leibniz-Centre General Linguistics (ZAS)
0
Bibliographies
BLLDB
0
BDSL
0
IDS Bibliografie zur deutschen Grammatik
0
IDS Bibliografie zur Gesprächsforschung
0
IDS Konnektoren im Deutschen
0
IDS Präpositionen im Deutschen
0
IDS OBELEX meta
0
MPI-SHH Linguistics Collection
0
MPI for Psycholinguistics
0
Linked Open Data catalogues
Annohub
0
Online resources
Link directory
0
Journal directory
0
Database directory
0
Dictionary directory
0
Open access documents
BASE
4
Linguistik-Repository
0
IDS Publikationsserver
0
Online dissertations
0
Language Description Heritage
0
© 2013 - 2024 Lin|gu|is|tik
|
Imprint
|
Privacy Policy
|
Datenschutzeinstellungen ändern