2019 Conference on Implantable Auditory Prostheses
14-19 July 2019
Granlibakken, Lake Tahoe
Page 161
M70: DISCRIMINATION THRESHOLDS IN SELF-PRODUCED SPEECH IN
COCHLEAR IMPLANT LISTENERS AND NORMAL LISTENERS
Sarah Bakst, Caroline A. Niziolek, Ruth Y. Litovsky
University of Wisconsin
Madison, Madison, WI, USA
Introduction. We listen to ourselves while talking to ensure that our speech matches what we intended to
say and how we intended to say it. Normal hearing speakers maintain acoustic consistency by accessing
their auditory feedback, which the auditory system uses to detect errors and update motor plans while
talking. This experiment investigates error detection in cochlear implant (CI) users. In people with CIs,
fine-grained spectral distinctions in the speech input are degraded because the CI processor only uses a
finite number of spectral channels. This signal degradation may result in sub-optimal comparison
between the speech output and speech target, possibly diminishing the ability to correct nascent errors.
Automatic error-detection is important for producing intelligible speech and successful communication.
This experiment investigates whether CI listeners can detect variability in their own productions, an
important first step in understanding the extent to which CI users rely on their auditory feedback while
speaking, and the extent to which the CI is important for learning and maintaining intelligible speech.
Hypothesis. This experiment seeks the auditory threshold in variation in the first and second formants
(F1/F2), spectral peaks which are important for distinguishing vowel height (e.g. "he" vs. "hey") and
backness ("he" vs. "who"). Adjacent vowels may be separated by less than 100 Hz in F1 and 1000 Hz in
F2, but large electrode frequency bandwidths may not preserve subphonemic differences in
pronunciation. For example, the default Cochlear TM frequency allocation table sets bandwidths at 125
Hz in F1 frequency space and 125-500 Hz in F2 space. We hypothesize that CI users may not be able to
detect subphonemic differences in their speech, especially if those differences do not cross electrode
boundaries.
Stimuli. Participants were recorded speaking the words "Ed" and "oh" ten times each. A single token of
each word was selected as a base for stimuli generation. F1 was decreased in "Ed" and F2 was
increased in "oh" in increments of 1 Hz using Audapter (Cai et al. 2008, Tourville et al. 2013). For NH
listeners, the resulting stimuli were passed through a 16-channel vocoder to approximate the spectral
degradation of a CI.
Perception Task. Speakers heard their own altered speech in a four-interval, two-alternative-forced
choice design, where one stimulus was different from the others. Listeners determined whether the odd
stimulus occurred in the first or second pair of sounds (i.e. AABA or ABAA presentation), where A was
always the unaltered stimulus. The initial interval size was 500 Hz. The interval between the base and
altered stimulus increased by three steps after an incorrect response and decreased by a single step
after a correct one. The experiment ended after 20 reversals (change from correct to incorrect or vice
versa). The intervals from the final six reversals were averaged to approximate the threshold.
Results. Pilot data (CI n=1, NH n=3) show thresholds that vary both by individual and formant. For the CI
listener, the threshold of F1 was 47 Hz and in F2 it was 30 Hz. Given how frequencies are distributed
across electrodes, such low thresholds were not expected. The NH listeners were variable: for F1,
thresholds were 165, 188 Hz, and 120 Hz; for F2, thresholds were 115 and 135 Hz (first two subjects
only); thus listeners varied in how well they accessed various acoustic cues to perceive fine distinctions
in speech. These preliminary results suggest the possibility that CIs may in fact preserve some
accessible information about subphonemic speech variation. The results of this study will be used to
guide further research to determine how this acoustic information may be used by CI users to maintain
speech intelligibility while speaking.
Work supported by NIH-NIDCD (R00144-AAC7623 to CN; T325T32DC005359-13 to UW
Madison;
R01DC003083 to RYL) and in part by NIH-NICHD (U54HD090256) to the Waisman Center.