Biomechanics of Human Voice Production and Control

Neuromuscular Control of the Larynx

Toward an Interdisciplinary Approach to Voice Studies

Toward Standardizing Perceptual Voice Quality Measures

Variance and Invariance in Voice Quality: Implications for Machine and Human Speaker Identification


Biomechanics of Human Voice Production and Control

Description

The long term goal of this research is to establish a biomechanical theory of human voice production and control that would guide clinical management of voice disorders. This theory may also find applications in physically-based natural speech synthesis. The research focuses on the following three areas: 1) characterization of vocal system biomechanical properties and their regulation by laryngeal muscle activation; 2) how changes in biomechanical properties affect vocal fold vibration and the produced voice; and 3) how voice quality perception is affected by changes in vocal fold vibration and the resulting voice acoustics. Different experimental models of phonation are developed, including a synthetic self-oscillating silicone model of the vocal folds, for controlled parametric studies of laryngeal fluid-structure interaction, and an ex-vivo perfused human larynx model, in which the human larynx is perfused immediately after whole organ harvest to maintain muscle viability to examine laryngeal muscle activation and its effect on voice production. Conceptualization of observations from these experimental studies will be used to develop reduced-order computational models of phonation.

Funded by NIH/NIDCD grants R01DC001797, R01DC009229, and R01DC011299.

Researchers

Gerald Berke

Dinesh Chhetri

Bruce R. Gerratt

Jennifer Long

Jody Kreiman

Zhaoyan Zhang

Representative publications and presentations

Zhang, Z. (2016). Mechanics of human voice production and controlJournal of the Acoustical Society of America, 140(4), 2614-2635.

Zhang, Z. (2016). Respiratory laryngeal coordination in airflow conservation and reduction of respiratory effort of phonationJournal of Voice30(6), 760.e7–760.e13.

Zhang, Z. (2016). Cause-effect relationship between vocal fold physiology and voice production in a three-dimensional phonation modelJournal of the Acoustical Society of America, 139, 1493-1507.

Yin, J., and Zhang, Z. (2016). Laryngeal muscular control of vocal fold posturing: Numerical modeling and experimental validationJournal of the Acoustical Society of America, 140(3), EL280-EL284.

Kreiman, J., Gerratt, B.R., Garellek, M., Samlan, R., and Zhang, Z. (2014). Toward a unified theory of voice production and perception.  Invited paper, Loquens, Spanish Journal of Speech Science, 1, e009.

Berke, G., Mendelsohn, A.H., Howard, S., and Zhang, Z. (2013). Neuromuscular induced phonation in a human ex vivo perfused larynx preparationJournal of the Acoustical Society of America, 133, EL114-EL117.

Zhang, Z., and Luu, T.H. (2012). Asymmetric vibration in a two-layer vocal fold model with left-right stiffness asymmetry: Experiment and simulationJournal of the Acoustical Society of America132, 1626-1635.

Zhang, Z., Neubauer, J., and Berry D.A. (2007). Physical mechanisms of phonation onset: a linear stability analysis of an aeroelastic continuum model of phonationJournal of the Acoustical Society of America122, 2279-2295.

Zhang, Z., Neubauer, J., and Berry D.A. (2006). The influence of subglottal acoustics on laboratory models of phonation. Journal of the Acoustical Society of America, 120, 1558-1569.

 

↑ Top ↑


Neuromuscular Control of the Larynx

Description

This research focus is a comprehensive effort to understand normal and abnormal voice production mechanisms.  Current effort is geared towards understanding the vibratory, aerodynamic, acoustic, and perceptual sequelae of laryngeal asymmetry, and to generate a scientific basis for optimal treatment of glottal insufficiency (seen in paresis, paralysis, presbylarynx, and neurodegenerative disorders). Asymmetric laryngeal activation in laryngeal neuromuscular diseases such as vocal fold paresis and paralysis alter the flow-structure interaction within the glottic channel and affect acoustic and aerodynamic parameters such as fundamental frequency (F0), phonation onset pressure/airflow relationship, stability of the laryngeal vibration to increasing subglottal pressure, and voice quality.  Understanding the fundamental mechanisms that control these parameters is necessary to more scientifically treat vocal fold insufficiency.  Unfortunately, these mechanisms are poorly understood, which limits our ability to optimize treatment for this disorder.  The research will focus on the roles of two major variables that determine the acoustic output and the aerodynamic requirements for voice production: (1) material properties (tension/stiffness) of the vocal folds and (2) glottal channel geometry or shape. Activation of the intrinsic laryngeal muscles controls both these major independent variables, and thus investigations of the neuromuscular control of the larynx in an in vivo larynx stand to improve our understanding of normal voice production as well as optimize the treatment of voice disorders.  Both in vivo as well as ex vivo phonation models are used in these endeavors.

Funded by NIH/NIDCD R01DC11300.

Researchers

Dinesh K. Chhetri

Jody Kreiman

Zhaoyan Zhang

 

↑ Top ↑


Toward an Interdisciplinary Approach to Voice Studies

Description

At present, this “project” comprises a loose confederation of investigators in different departments who share an interest in voice.  The study of voice and voice quality has long been characterized by segregation across disciplinary lines, with little interchange of data or ideas between scholars who do not share the same research focus.  Studies by scientists of the manner in which humans produce and/or perceive voice quality are nearly unintelligible to humanists examining issues of social, cultural, aesthetic, and political messages conveyed by the same phenomena, and vice versa.  Our work together attempts to provide a foundation to support dialogs and promote mutual understanding among different groups of scholars.  By examining where and why different scholarly traditions overlap, where they abut, and where they differ, we hope to elucidate how these bodies of work might eventually combine as parts of a single academic discipline of “voice studies.”

Researchers

Jody Kreiman

Bruce R. Gerratt

Zhaoyan Zhang

Nina Eidsheim (Musicology)

Patricia Keating (Linguistics)

Abeer Alwan (Electrical Engineering)

Representative publications and presentations

Eidsheim, N., and Kreiman, J. (in preparation). Life versus art in voice change: The case of Maria Callas. To appear in The Oxford Handbook of Timbre.

Kreiman, J. (2017). Voice studies as a discipline. To appear in The Oxford Handbook of Voice Studies, edited by K. Meizel and N. Eidsheim.

Kreiman, J., and Gerratt, B.R. (2017). Reconsidering the nature of voice. To appear in The Oxford Handbook of Voice, edited by P. Belin and S. Frueholtz.

Kreiman, J., and Gerratt, B.R. (in preparation). Modeling a speaker’s voice quality. To appear in  D.B. Pisoni, R.E. Remez, et al., The Handbook of Speech Perception, second edition (Oxford:  Blackwell).

Kreiman, J., and Sidtis, D. (2011). Foundations of Voice Studies (Malden, MA:  Wiley/Blackwell).

Sidtis, D., and Kreiman, J. (2012). In the beginning was the familiar voice: Personally familiar voices in the evolutionary and contemporary biology of communicationIntegrative Psychological & Behavioral Science, 46, 146-159.  PMCID:  PMC3224673.

Eidsheim, N., and Kreiman, J. Jimmy Scott and the problem of gender in singing. Invited paper, presented at the 171st Meeting of the Acoustical Society of America, May, 2016.

Eidsheim, N., and Kreiman, J. Life versus art in voice change: The case of Maria Callas. To be presented at the 174th Meeting of the Acoustical Society of America, November, 2017.

↑ Top ↑


Toward Standardizing Perceptual Voice Quality Measures

Description

The overall aims of this proposal are 1) to complete development of a theoretically-motivated measurement protocol for voice quality that enables users to reliably and validly assess the overall, integral quality of a voice; 2) to link measures of voice quality made with this protocol to changes along a better/worse continuum, and to underlying glottal pulse shapes, leading to clinical application; and 3) to complete development of the tools needed to accomplish these goals.  Such a quality measurement system is required because voice exists as a series of events linking a speaker to a listener.  For this reason, clinical understanding of any one aspect of voice—physical, acoustic, and/or perceptual—requires knowledge of the other two.  For example, changes in vocal fold vibratory patterns or glottal pulse shapes are important to the extent that they produce perceptually meaningful changes in the sound of the voice—an inaudible physical change is not likely to be clinically relevant.  Similarly, a change in voice quality means that some change has occurred in the way the voice is produced, whose acoustic result was transmitted to the listener where it ultimately elicited the perceived quality.  Because of these linkages among production, acoustics, and perception, we can answer very few basic questions about vocal function without benefit of an adequate model of quality.  In other words, much of the meaning of vocal fold function and acoustic measures in the study of voice derives from their ultimate influence on perceived quality, and thus cannot be formally established unless quality is successfully quantified.

Funded by NIH/NIDCD grant DC01797.

Researchers

Jody Kreiman

Bruce R. Gerratt

Zhaoyan Zhang

Abeer Alwan

Norma Antoñanzas-Barroso

Neda Vesselinova

Representative publications and presentations

Garellek, M., Samlan, R., Gerratt, B.R., and Kreiman, J. (2016). Modeling the voice source in terms of spectral slopesJournal of the Acoustical Society of America, 139, 1404-1410.

Gerratt, B.R., Kreiman, J., Antoñanzas-Barroso, N., and Berke, G.S. (1993). Comparing internal and external standards in voice quality judgmentsJournal of Speech and Hearing Research36, 14-20. 

Kreiman, J., Antoñanzas-Barroso, N., and Gerratt, B.R. (2010). Integrated software for analysis and synthesis of voice qualityBehavior Research Methods, 42, 1030-1041. PMCID:  PMC3719850.

Kreiman, J., and Gerratt, B.R. (2000). Sources of listener disagreement in voice quality assessmentJournal of the Acoustical Society of America, 108, 1867-1876.

Kreiman, J., Gerratt, B.R., Garellek, M., Samlan, R., and Zhang, Z. (2014). Toward a unified theory of voice production and perception.  Invited paper, Loquens, Spanish Journal of Speech Science, 1, e009.

Kreiman, J., Gerratt, B.R., and Ito, M. (2007). When and why listeners disagree in voice quality assessment tasksJournal of the Acoustical Society of America122, 2354-2364.

Kreiman, J., Gerratt, B.R., Kempster, G.B., Erman, A., and Berke, G.S. (1993). Perceptual evaluation of voice quality: Review, tutorial, and a framework for future researchJournal of Speech and Hearing Research, 36, 21-40. Reprinted in R.C. Branski & L. Sulica (eds.), Classics in Voice & Laryngology (San Diego, Plural), 2008, pp. 55-71.

↑ Top ↑


Variance and Invariance in Voice Quality: Implications for Machine and Human Speaker Identification

Description

This project uses the UCLA Speaker Variability Database to examine the extent and nature of within-speaker variability in voice acoustics and quality, in an attempt to understand exactly what it means to sound like oneself and not like someone else.  Our goal is to characterize voice variability within and across speakers under circumstances that introduce variability in everyday life situations. Acoustic analysis will result in a multi-dimensional acoustic model, or profile, of each speaker that specifies the range of parameter values that are typical in the corpus for that speaker, and the likelihood of deviations from that usual profile. These profiles will provide predictions about confusability among tokens and speakers.  Perceptual studies will then determine the extent to which parameter profiles predict perceived similarity by human listeners, and how much (and what kind of) variability in each parameter can be tolerated before speakers cease to sound like themselves. Results will provide a baseline for the automatic methods, and will also be valuable in validating ear witness testimony.

Supported by NSF grant IIS-1704167.

Researchers

Jody Kreiman

Abeer Alwan (Electrical Engineering)

Patricia Keating (Linguistics)

Neda Vesselinova (Linguistics)

Soo Jin Park (Electrical Engineering)

Representative publications and presentations

Keating, P., and Kreiman, J. Acoustic similarities among female voices. Presented at the 172nd Meeting of the Acoustical Society of America, November, 2016

Kreiman, J., Keating, P., and Vesselinova, N. Acoustic similarities among voices.  Part 2:  Male speakers. To be presented at the 174th Meeting of the Acoustical Society of America, November, 2017.

Kreiman, J., Park, S.J., Keating, P.A., and Alwan, A.  (2015). The relationship between acoustic and perceived intraspeaker variability in voice quality. Interspeech 2015, pp. 2357-2360.

Park, S.J., Guo, J.,  Sigouin, C., Yeung, G., Kreiman, J., Kuo, F.Y., Alwan, A., and Keating, P. (2016). Speaker identity and voice quality: Modeling human responses and automatic speaker recognition. Interspeech 2016.

Park, S.J., Yeung, G., Kreiman,  J., Keating, P.A., and Alwan, A. (2017). Using voice quality features to improve short-utterance, text-independent speaker verification systems. To be presented at Interspeech 2017.

↑ Top ↑