1 |
Measuring the Reading-Attention Relationship: Functional Differences in Working Memory Activity During Single Word Decoding in Children With and Without Reading Disorder
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Phonological skills and literacy in Dutch beginning readers ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Множественная трактовка сложных слов в эквивалентных словосочетаниях ... : Multiple Interpretations of Complex Words in Equivalent Word Combinations ...
|
|
Рязанова, В.А.. - : Государственное автономное образовательное учреждение высшего образования города Москвы «Московский городской педагогический университет», 2022
|
|
BASE
|
|
Show details
|
|
4 |
MTS-Stega: Linguistic Steganography Based on Multi-Time-Step
|
|
|
|
In: Entropy; Volume 24; Issue 5; Pages: 585 (2022)
|
|
BASE
|
|
Show details
|
|
5 |
The contributions of phonological awareness and decoding on spelling in isiXhosa Grade 3 readers ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Exploring the Representations of Individual Entities in the Brain Combining EEG and Distributional Semantics.
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Reading Strategy Intervention and Reading Comprehension Success in Bilingual Readers
|
|
|
|
In: Electronic Thesis and Dissertation Repository (2022)
|
|
BASE
|
|
Show details
|
|
8 |
Cortical basis of vocalization in behaving freely moving minipigs ; Bases corticales de la vocalisation chez le miniporc en comportement
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03353386 ; Neuroscience. Université Grenoble Alpes [2020-.], 2021. English. ⟨NNT : 2021GRALS013⟩ (2021)
|
|
BASE
|
|
Show details
|
|
9 |
Analysis of cortical activity for the development of brain-computer interfaces for speech ; Analyse d'activité corticale pour le développement d'interfaces cerveau-machine pour la parole
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03578854 ; Bioinformatics [q-bio.QM]. Université Grenoble Alpes [2020-.], 2021. English. ⟨NNT : 2021GRALS022⟩ (2021)
|
|
BASE
|
|
Show details
|
|
10 |
One Source, Two Targets: Challenges and Rewards of Dual Decoding
|
|
|
|
In: Conference on Empirical Methods in Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-03345478 ; Conference on Empirical Methods in Natural Language Processing, Nov 2021, Online and Punta Cana, Dominican Republic ; https://2021.emnlp.org/ (2021)
|
|
BASE
|
|
Show details
|
|
11 |
Performance differences on reading skill measures are related to differences in cortical grey matter structure in young adults ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Closed-Loop Cognitive-Driven Gain Control of Competing Sounds Using Auditory Attention Decoding
|
|
|
|
In: Algorithms ; Volume 14 ; Issue 10 (2021)
|
|
BASE
|
|
Show details
|
|
14 |
Decoding Methods in Neural Language Generation: A Survey
|
|
|
|
In: Information ; Volume 12 ; Issue 9 (2021)
|
|
BASE
|
|
Show details
|
|
15 |
A Systematic Review and Meta-Analysis of Reading and Writing Interventions for Students with Disorders of Intellectual Development
|
|
|
|
In: Education Sciences ; Volume 11 ; Issue 10 (2021)
|
|
BASE
|
|
Show details
|
|
16 |
Development of Reading Comprehension in Bilingual and Monolingual Children—Effects of Language Exposure
|
|
|
|
In: Languages ; Volume 6 ; Issue 4 (2021)
|
|
BASE
|
|
Show details
|
|
17 |
Deep Cross-Modal Alignment in Audio-Visual Speech Recognition
|
|
Sterpu, George. - : Trinity College Dublin. School of Engineering. Discipline of Electronic & Elect. Engineering, 2021
|
|
Abstract:
APPROVED ; Modern studies in cognitive psychology have demonstrated that speech perception is a multimodal process, as opposed to a purely auditory one with visual carryover as in the classic view. This led researchers to investigate the nature of the audio-visual speech integration process in the brain. The ability to combine the two sources of information delivering uncertain predictions improves the recognition of speech. In this thesis we aim to develop efficient machine learning algorithms and computational models of audio-visual speech recognition (AVSR) that learn to capitalise on the visual modality from examples. My original contribution to knowledge is an efficient strategy for the multimodal alignment and fusion of audio-visual speech on the task of large vocabulary continuous speech recognition. This strategy, termed AV Align, makes limited use of domain knowledge, but exploits the hypothesis that there is an underlying alignment between the higher order representations of the audio and visual modalities of speech. To achieve a controllable decoding latency, we develop a speech segmentation strategy termed Taris. This strategy aims to segment a spoken utterance by learning to count the number of words from speech data. Our multimodal systems are presented with audio and video recordings of speech from two large vocabulary audio-visual speech datasets, TCD-TIMIT and LRS2. We corrupt the audio channel with noise taken from a cafeteria environment at three signal to noise ratios. For each noise condition, we evaluate the character error rate of the multimodal system, and compare it to an equivalent audio-only system trained on the same data to assess the added benefit of the visual modality to speech recognition. We show empirically that AV Align discovers a monotonic trend in the alignment between the audio and visual modalities. This monotonicity is achieved while AV Align is allowed to search for a soft alignment across full speech utterances, without any supervision or constraints placed on the alignment pattern. On LRS2, the most challenging audio-visual speech dataset used in this work, AV Align obtains improvements over an audio-only system ranging from 6.4% under clean speech conditions up to around 31% at the highest level of audio noise. These improvements were made possible after an exploration of the learning difficulties specific to the audio-visual speech recognition task, which led us propose a multitask learning approach based on estimating the intensities of two facial action units from video. We also show that the word counting objective of Taris favours the segmentation of speech into units following a similar length distribution as the one of word units estimated with forced aligner. The correlation between our segments and the word units remains only speculative. Since we design the decoding process of Taris to be robust to segmentation imperfections, we achieve a comparable level of accuracy with equivalent systems that make full use of the utterance-level context and are indifferent to latency. Our findings reflect that we have discovered two well informed modelling assumptions contributing to the domain knowledge of audio-visual speech. The first one is the underlying higher order fusion of cross-modally aligned audio and visual speech representations. The second one is the possibility to learn the word count in a spoken utterance from either audio and audio-visual cues as a mechanism to segment transcribed speech lacking intermediate alignments. Both AV Align and Taris have objectives expressed as fully differentiable functions of the parameters. We believe these will be key ingredients to the adoption of audio-visual speech recognition technology into real products in the years to come.
|
|
Keyword:
Audio-Visual Speech Recognition; AV Align; Deep learning; Multimodal fusion; Online decoding; Speech Recognition; Taris
|
|
URL: https://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:STERPUG http://hdl.handle.net/2262/96649
|
|
BASE
|
|
Hide details
|
|
18 |
A Morphological Awareness-based Intervention for Struggling Readers
|
|
|
|
BASE
|
|
Show details
|
|
19 |
The roles of decoding and vocabulary in Chinese reading development: Evidence from a 3‐year longitudinal study
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Development of Reading Comprehension in Bilingual and Monolingual Children : Effects of Language Exposure
|
|
|
|
In: Languages ; 6 (2021), 4. - 166. - MDPI Publishing. - eISSN 2226-471X (2021)
|
|
BASE
|
|
Show details
|
|
|
|