Mapping the temporal dynamics during audio-visual speech processing using connectivity analysis

Author: Caitlin Wouters

Wouters, Caitlin, 2019 Mapping the temporal dynamics during audio-visual speech processing using connectivity analysis, Flinders University, College of Science and Engineering

Terms of Use: This electronic version is (or will be) made publicly available by Flinders University in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. You may use this material for uses permitted under the Copyright Act 1968. If you are the owner of any included third party copyright material and/or you believe that any material has been made available without permission of the copyright owner please contact with the details.


The project undertaken is ‘mapping the temporal dynamics during audio-visual speech processing using connectivity analysis’. Speech processing relies on both auditory and visual sensory input. The visual stimulation observed during speech can influence what is perceived. This is an effect which is processed in well-defined regions of the brain.

The stimulus to be presented to the subjects was a series of individual stimuli of ‘ABBA, ‘AGGA’, ‘ATHA’ or ‘APPA’. The ‘ATHA’ is an artificial perception of auditory of ‘ABBA’ and the video of ‘AGGA’. There were also static ‘AGGA’ and ‘ABBA’’s where there was the auditory sound but no visual stimulation. These were presented at three different volume levels. The data was recorded at a very high sampling rate of 9600Hz with 21 subjects.

Components were fit to the data for the connectivity measure. This component fitting was done across all subjects as a group to result in common components across all subjects for comparison and higher amount of data for stimuli. This resulted in a final set of components that are common for all subjects that were in the regions of interest.

Transfer entropy and conditional granger causality were used to provide a directed measure of connectivity. Transfer entropy was not the best measure, as none of the results were significant. Conditional granger causality was a better measure of connectivity providing some statistically significant results. To be a temporal analysis, the connectivity was taken in small 200ms windows across the post stimulus period.

An increase in the level of acoustic background noise caused there to be an increase in connection to the STS and more recruitment of the visual cortex, suggesting recruitment of STS in recognising speech in noisy conditions. There was obvious recruitment of the STS in the correct compared to little or no recruitment in incorrect. The ATHA stimulus caused there to be recognition of ABBA, as ATHA in high noise conditions, due to this pre-disposure to the sound with a different visual. This was then found to be a frontal activity that then recruited the STS and visual streams in recognising this from memory. In comparing a McGurk stimulus to a non-McGurk stimulus, it is obvious that in high noise conditions there is still this recruitment of the STS in understanding noisy speech, even when this does not change what is perceived. When there is no visual stimulation, there are still signals sent to the STS but just no output from it, however interestingly there are still signals sent from the visual cortex, which may be signals that are saying there is no visual input of significance. The STS recruitment was almost always a very initial process in the first 200-300ms, except when being accessed from memory – such as the case of guessing ATHA in a high noise ABBA.

Keywords: McGurk Effect, temporal dynamics, audio-visual processing, connectivity analysis, transfer entropy, conditional granger causality, noise conditions, STS, superior temporal sulcus

Subject: Engineering thesis

Thesis type: Masters
Completed: 2019
School: College of Science and Engineering
Supervisor: Trent Lewis