Home
Catalogue search
Refine your search:
Keyword:
Arabic speech recognition (1)
Artificial Intelligence (1)
corpus (1)
deepspeech (1)
multi-dialect (1)
multi-genre (1)
multi-regional (1)
transcription. (1)
Creator / Publisher:
Abandah, Gheith (1)
Al-Barham, Muhammad (1)
Al-Fetyani, Mohammad (1)
Alsharkawi, Adham (1)
Dawas, Maha (1)
Year:
2021 (1)
Medium
Type
BLLDB-Access
Search in the Catalogues and Directories
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
Sort by
creator [A → Z]
'
creator [Z → A]
'
publishing year ↑ (asc)
'
publishing year ↓ (desc)
'
title [A → Z]
'
title [Z → A]
'
Simple Search
Hits 1 – 1 of 1
1
Massive Arabic Speech Corpus (MASC) ...
Al-Fetyani, Mohammad
;
Al-Barham, Muhammad
;
Abandah, Gheith
;
Alsharkawi, Adham
;
Dawas, Maha
. - : IEEE DataPort, 2021
Abstract:
This paper releases and describes the creation of the Massive Arabic Speech Corpus (MASC). This corpus is a dataset that contains 1,000 hours of speech sampled at 16~kHz and crawled from over 700 YouTube channels. MASC is multi-regional, multi-genre, and multi-dialect dataset that is intended to advance the research and development of Arabic speech technology with the special emphasis on Arabic speech recognition. In addition to MASC, a pre-trained 3-gram language model and a pre-trained automatic speech recognition model are also developed and made available for interested researches. For a better language model, a new and unified Arabic speech corpus is required, and thus, a dataset of 12~M unique Arabic words is created and released. To make practical and convenient use of MASC, the whole dataset is stratified based on dialect into clean and noisy portions. Each of the two portions is then stratified and divided into three subsets: development, test, and training sets. The best word error rate achieved by ...
Keyword:
Arabic speech recognition
;
Artificial Intelligence
;
corpus
;
deepspeech
;
multi-dialect
;
multi-genre
;
multi-regional
;
transcription.
URL:
https://dx.doi.org/10.21227/e1qb-jv46
https://ieee-dataport.org/open-access/massive-arabic-speech-corpus-masc
BASE
Hide details
Mobile view
All
Catalogues
UB Frankfurt Linguistik
0
IDS Mannheim
0
OLC Linguistik
0
UB Frankfurt Retrokatalog
0
DNB Subject Category Language
0
Institut für Empirische Sprachwissenschaft
0
Leibniz-Centre General Linguistics (ZAS)
0
Bibliographies
BLLDB
0
BDSL
0
IDS Bibliografie zur deutschen Grammatik
0
IDS Bibliografie zur Gesprächsforschung
0
IDS Konnektoren im Deutschen
0
IDS Präpositionen im Deutschen
0
IDS OBELEX meta
0
MPI-SHH Linguistics Collection
0
MPI for Psycholinguistics
0
Linked Open Data catalogues
Annohub
0
Online resources
Link directory
0
Journal directory
0
Database directory
0
Dictionary directory
0
Open access documents
BASE
1
Linguistik-Repository
0
IDS Publikationsserver
0
Online dissertations
0
Language Description Heritage
0
© 2013 - 2024 Lin|gu|is|tik
|
Imprint
|
Privacy Policy
|
Datenschutzeinstellungen ändern