Using fMRI and a data-mining algorithm, researchers in the Netherlands have developed a method to make predictions about what the listener is hearing based solely on the fMRI data. There are two predictions that they algorithm can make: discriminations between three Dutch vowel sounds (/a/, /i/, and /u/), and discriminations between which of three different speakers is currently talking.
Before we get any further, here are the numbers. When asked to discriminate between two different vowels, the mean correctness levels of the algorithm were 0.65, 0.69, and 0.63. All of these standard errors were well above chance (0.5), with highly significant p-values. When asked to discriminate between two different speakers (there were 3 total), the mean correctness levels were 0.70, 0.67, and 0.62, once again with highly significant p-values.
While this is impressive, the next part of their experiment is even more so. The researchers trained the algorithm to differentiate between only two vowels or two speakers, and tested to see their effectiveness in discriminating between novel stimuli–the third vowel or speaker paired with the first two. In order to remain effacious, the algorithm would have to be able to operate in many different acoustical dimensions. Miraculously, the algorithm was almost just as effective, with vowel mean correctness values of 0.66, 0.62, and 0.62, and speaker mean correctness values of 0.62, 0.65, and 0.63.
Their findings also included the observation that a representation of a vowel or speaker occurs not only in the task-specific higher levels of cognitive processing but also in “lower” auditory regions of sound processing. This lends further evidence that sound processing (and visual processing) occurs early in brain processing through feedback loops and reentrant processing.
The potential uses for this technology become very interesting as the correctness levels approach 100% and as the vowel recognition extends to all sounds. That is an ambitious but worthy goal for research in this area, due to its many obvious applications, perhaps even as a lie detector.
Reference
Formisano E, De Martino F, Bonte M, Goebel R. 2008 “Who” is saying “what”? Brain-based decoding of human voice and speech. Science 322:970-973, doi:10.1126/science.1164318.