Timothy C. Hain, MD Most recent update:
September 9, 2017
This document reviews the literature concerning feedback control of human vocal pitch, and also is meant to be a place where one can look at several mathematical models of central auditory feedback control in one place.
Fairbanks (1954) reviewed historical ideas in a concept paper concerning auditory feedback, going as far back as 1931. His servomechanism incorporated a desired signal, a storage unit, a multi-input comparator, a summing junction where the desired signal and error are added, and a plant. Fairbanks recognized that there might be multisensory inputs, and implemented auditory, tactile and proproceptive inputs. This is a modern idea, reminiscent of "internal estimator" theory, although Fairbanks did not discuss the problem of dealing with multiple inputs in a quantitative way. Although his "model" had a memory circuit, it was used to extrapolate future performance where error was driven to zero instead of matching delays of input to output.
Ringel and Steer (1963) examined the effect of anesthetizing the entire oral region (using the services of an oral surgeon to administer a series of intra-oral hypodermic injections) on the vocalizations of 13 subjects majoring in speech pathology and audiology. Unsurprisingly, speech accuracy suffered. Fundamental frequency also rose, suggesting that tactile feedback assists in stabilizing F0.
Eliott and Niemoeller (1970) argued that F0 (voice fundamental frequency) is accomplished in part through use of auditory feedback. They used evidence based on monitoring of voice F0 made to both current and remembered auditory targets, with noise masking of 95-105 SPL. They pointed out that the voice is most stable in a quiet environment when the target is present. It is less accurate when the target is remembered, and is least accurate when the target is remembered and noise is present.
Elman (1981) investigated the effect of frequency shifted feedback using a Lexicon Varispeech II Speech Time Compressor/Expandor. Speech F0 was shifted by 10%. This produced a compensatory F0 which was shifted downward.
Kawahara (1993-1994) reported adaptive responses to changes in F0 using a paradigm he called "transformed auditory feedback". He investigated F0 using a pseudorandom binary series of perturbations.
|Simple model of F0 control , Larson et al (2000).|
Larson and associates (Burnett et al, 1997; Hain et al, 1999; Larson et al, 2000) studied F0 in greater detail using pitch shifting with sophisticated electronics. A harmonizer device, capable of shifting F0 similarly to the device used by Elman (1981) was used to shift F0 by roughly 100 cents (one semitone) for steps, ramps and with DAF. A clear feedback response was demonstrated, which could be either negative to compensate for the perturbation, or positive to track an external perturbation. The response was shown to have two components, one with a latency of about 150 msec, and a second with a latency of about 450 msec. The later response is under more voluntary control. A simple negative feedback model incorporating comparison of auditory input with expected auditory input, low-pass filtering and limiting, was shown to reproduce most aspects of the response. This model depends upon comparison of intended or expected F0 with perceived F0, or in other words, a connection between motor output and auditory input.
Miller-Preuss (1980) pointed out the usefulness of such a comparison in attention to self-produced or externally produced acoustic inputs and suggest that such a direct connection might exist between cingulate vocalization areas and auditory association cortex (superior temporal gyrus).
Houde and Jorden (1998) examined feedback control of F1 and F2 using a customized apparatus that allowed them to shift subjects speech in real-time. They showed that subjects could be trained over two hours to shift vowel formats significantly with retention of the shift. They defined "adaptation" as the amount of shift that was retained af the end of the experiment when auditory feedback was "blocked" by noise.
Guenther and associates (2006) have written a body of literature concerning a neural network model of vocal motor control. It contains the same general toplogy as the model of Hain et al, 2000. While of great general interest, because it is implemented as a neural network, it has limited usefulness in experimental contexts because neural networks are sufficient to simulate nearly any output.
With a delay of about 200 msec, DAF causes subjects to speak more slowly, to prolong syllables or to stutter (Hanley and Tiffany, 1954: Davidson, 1959: Yates, 1963).
|Model of Delayed Auditory Feedback (from Hain et al, 2001)|
This was modeled successfully using a simple mismatch between perceived and intended F0 as shown in the model above.
Siegel and Pick (1974) demonstrated that speakers decrease their volume when they hear their voices amplified. Lane and Tranel (1971) demonstrated that speakers increase their volume when their voices are attenuated. Volume is also increased in the presence of noise (Lombard, 1911; Ringel and Steer, 1963). While the neurological localization of this reflex would seem to belong logically in the cortex, where assessments of intelligibility could be made, work in animals has suggested that neuronal mechanisms for evoking the reflex in decerebrate cats must be present in the brainstem (Nonaka et al, 1997). This work does not exclude the possibility of both low and higher level substrates.
|Model of loudness feedback (From Bauer et al, 2006)|
Bauer et al (2006) obtained additional experimental data and modeled loudness feedback control using similar mechanisms (see above).
Vocalization can be elicited from numerous areas in the brain.
Jurgens and Ploog (1970) pointed out the "exceptional position" of the mesenphalic periaqueductal gray. In the squirrel monkey, stimulation of the PAG produces the shortest latencies and the highest number of vocalizations. There are also reports of mutism after PAG lesions.
Miller-Preuss, Newman and Jurgens (1980) state that the anterior limbic cortex is involved in the initiation of voluntarily controlled vocalizations (in Squirrel Monkey).
Certain neurological disorders impair pitch control of the voice (prosody). For example FTD (Nevler et al, 2017) and PSP. Impairments of prosody are also found in survivors of right middle cerebral artery stroke, persons with autism, schizophrenia, and Parkinson disease (Leyton and Hillas, 2017)