Balancing Vowels and Consonants“Vowels are the emotion, and consonants are the intellect.” This is a commonly believed assertion, that the emotional content of your speech is shared through the vowel sounds, which are “free, and open”, while the intellectual component (the meaning!) is carried primarily through the consonants. I feel certain that there is no scientific proof behind this assertion, however, if there were a way to neutralize all the vowels, I imagine that you might be able to make out, generally speaking, the  meaning of the text. With just the vowels, that would be impossible.

Take this Maya Angelou quote:

Words mean more than what is set down on paper.
It takes the human voice to infuse them with deeper meaning.
—Maya Angelou, I Know Why the Caged Bird Sings.

In IPA, this reads as

[wɝdz min mɔɚ ðən wʌt ɪz sɛt daʊn ɒn peɪpɚ ǁ
ɪt teɪks ðə hjumən vɔɪs tu ɪnˈfjuz ðəm wɪθ dipɚ minɪŋ ]
Listen to the VoiceGuy read this text. (mp3)

If we were to remove the vowels altogether, it would be unpronounceable. But if we replace every vowel with the same vowel, we might get a sense of the text as “merely consonants”:

[wudz mun muɚ ðun wut uz sut dun un ˈpupuɚ ǁ
ut tuks ðu ˈhjumun vus tu unˈfjuz ðum wuθ ˈdupuɚ ˈmunuŋ ]
Listen to the VoiceGuy speak this consonant phrase. (mp3)

Perhaps you might be able to make out that meaning…

On the other hand, if we were to remove the consonants and just link up all the vowel sounds, it would be significantly more difficult.

[ˈɝ i ˈɔɚ ə ʌ ɪ ɛ ˈaʊ ɒ ˈeɪ ɚ ǁ
ɪ eɪ ə ˈju ə ˈɔɪ u ɪ ˈju ə ɪ ˈi ɚ ˈi ɪ ]

Listen to the VoiceGuy speak this vowel phrase. (mp3)

It seems the argument for at least a better chance at intelligibility is demonstrated through the consonants more readily than through the vowels. The emotion, or at least one’s emotional connection to the language, your personalization, perhaps travels through the balance of long and short open sounds in the vowels. But we might also argue that continuant consonants, especially after a vowel, such as  [m, n, ɫ, v, f, s, z, ʃ, ʒ, θ, ð], which have the potential for length, could also offer opportunities to the actor to relish the sound of the text and convey their emotional connection through the sound of the text. But couldn’t one also argue that, for some emotions, a staccato sound, such as those one might hear from energetically articulated stop-plosives  [p, b, t, d, k, g] and affricates  [tʃ, dʒ] would be a very effective means of revealinɡ your feelings.

Vowels are rather vague in their nature. While consonants are described based upon their voicing, manner,  and place, vowels “hover” in the vowel space defined merely by the very relative concepts of the open-close and front-back continuums. Those reference points for consonants are sort of like frets on a guitar, very solid reference points, whereas vowels are more like a violin, which you must play by ear to find the right tuning of each note. As a result of this vagueness, they are much more variable across different speakers, and even within a single speaker. At their core, accents tend to have far more difference, one to another, in their vowel qualities than in their consonants. Vowels can help one label a speaker based upon a regional difference, a social “class” or economic group. Consonants, on the other hand, seem to be more consistent, and less likely to be affected by accents.

As speech becomes more casual, some consonant qualities soften; as speech become more formal, some consonant qualities become more active. As a result, we might choose to “elevate” one consonant to another in an attempt to more closely represent the sound represented by its spelling, its “citation” form. For instance, in a casual form, many North American English speakers will drop the /t/ sound in the word winter,  [ˈwɪnɚ]. However, when being more emphatic or more formal, the same speakers are likely to include the /t/ sound, [ˈwɪntɚ].

For vowels, in a casual style some speakers tend to relax their mouth in such a way that the vowels become less distinct one from another, as if the vowel space were contracting inward towards its centre, which we tend to think of as the “schwa” sound. Similar to consonants in some ways, while being more formal, more emphatic, or trying to be more intelligible, these speakers will attempt to push their vowels more to the periphery of the vowel space, in order to make each vowel more distinct from its neighbours.

It has been argued that having a larger vowel inventory can raise a speaker’s degree of intelligibility—distinguishing between vowels that may be merged together in one accent but are split into several groups does make for fewer homophones, words that sound alike. For instance, speakers who say Mary, merry and marry all alike are (arguably) more intelligible because those words can’t be confused one for another. However, one is a name, one is an adjective and one is a verb, so it’s highly unlikely that we would confuse one for another. In most situations, context is sufficient to differentiate homophones from each other, and so this argument is pretty weak. One can choose an alternate pronunciation (usually one that is associated with a higher status accent) as a means of emphasizing a word, though one risks being taken for someone who is mocking the choice, rather than someone who is merely embracing it. For example, if I said the Albert Einstein quote “Weakness of attitude becomes weakness of character,” and chose to pronounce the emphasized word as [ˈkæɹɪktɚ] rather than as [ˈkɛɚɹɪktɚ](as I would normally say it), it might be perceived as a choice whereby I was elevating the word through the pronunciation that happens to be a feature of many English accents, and so it may  be perceived as somewhat “high falutin'” for the circles I tend to work in, which can be perceived as a form of emphasis associate with high status and education.

In arguing for greater balance for vowels and consonants, I’m suggesting that careful choices about how one pronounces both vowels and consonants can lead to greater specificity and greater variety, both of which aid in increasing one’s intelligibility profile. And, as Martha Stewart might say, “that’s a good thing.”


Eric Armstrong is the voiceguy. Eric is a dialect, voice, speech and text coach based in Toronto, Canada, where he normally teaches full-time at York University’s Dept. of Theatre. Eric has been teaching voice for the actor full-time since 1994, and has taught in Canada and the US, at the University of Windsor, Brandeis University, Roosevelt University, Canada's National Voice Intensive and York University. He has worked for nationally and internationally recognized companies such as Crow’s Theatre, Volcano, SoulPepper, & Canadian Stage in Toronto, and The Court Theatre and Steppenwolf in Chicago. Eric holds a BFA from Concordia University (Montreal) in Theatre Performance, and an MFA from York University (Toronto) in Acting. His mentors were David Smukler (York, Canada’s National Voice Intensive) and Andrew Wade (Royal Shakespeare Company). He has also studied at the Drama Studio, London, and Il Stage Internazzionale di Commedia dell’Arte in Reggio Emilia, Italy. He’s a long time member of the Voice and Speech Trainers Association, where he has served on the board, as a conference planner, photo editor for the Voice and Speech Review, Founding Director of Technology and Internet Services, and has written numerous peer-reviewed articles, essays and reviews for the VASTA Newsletter, the VASTA Voice, and The Voice and Speech Review.

  1. Alexis says:

    Fantastic! This was a really helpful breakdown. Thank you!!