Posted: Apr 10, 2012 5:46 pm
by Zwaarddijk
Even in English, nonpulmonic airstreams are used in at least one recurrent sound. This sound is considered somewhat paralinguistic, as its never used in forming words, it's only used by itself and as a kind of marker of disapproval.

"tsk"/"tut" is actually a dental click. Another click is used in English (and many European languages) to get a horse moving. So clicks are not entirely unique to some south and central African languages - it's their use as actual linguistic sounds in those languages that is unique.

Sounds that aren't part of any word in a language, yet appear in isolation for a variety of small tasks are not unusual at all in languages either - northern varieties of Scandinavian (as if normal Scandinavian isn't northern enough) often have an ingressive fricative that is used to express agreement with what someone just said. Clicks can be found around the world as isolated sounds used to express an array of things wordlessly, varying from disapproval, a signal to get going, a way of expressing that you find the food pleasing, etc.

Clicks, unlike all consonants I've mentioned this far, don't get their air stream from the lungs. They use a rather cool trick: there's closure at two places - mostly a velar, k-like closure in the back of the mouth and a closure somewhere ahead of it. The velar closure is retracted so that the volume of the enclosed space is increased - the pressure between the closures is reduced. The other closure is opened and air streams in to fill the enclosed space, causing turbulence.

Clicks as linguistic sounds only can be found in Khoisan languages and a number of Bantu languages that probably have borrowed their clicks from neighbouring Khoisan languages, and in one language/ritual register in Australia, Damin. Damin is weird for a number of reasons, leading many scholars to conclude it's been intentionally constructed by tribal elders*. How clicks come about is actually an open question among historical linguists.

I've included a short bit on Damin in this spoiler here, but it's not important per se:
[Reveal] Spoiler: Damin
* Damin was a (or two) language(s) for initiates - men that fulfill certain requirements were permitted to learn it in their teenage years; I've forgotten a fair share of the cultural details about Damin, the linguistics of it is much more interesting. Damin basically has the same grammar as the language spoken by the rest of the tribes (Yangkaal and Lardil), it's just a lexical substitution, mostly, where Damin lexemes may correspond to several Yangkaal or Lardil lexemes. Another quirk in it, though, that kind of indicates it might not be a naturally evolved language is its lack of a distinction between third and second person - essentially, "you" and "he/she/it" are conflated into one non-first person. This is not known from any other language. Now, presenting a hypothesis that a language isn't natural would seem rather daring and unacademic, but the relevant tribe also basically says that Damin was made up by the tribes elders way back, so ...


Apparently, one click in !Xóõ (spoken in Botswana) has a pulmonic ingressive nasal airstream going at the same time as the click is made, so essentially there's two air streams going on at the same time, one originating in the mouth, the other through expansion of the lungs.

Turns out clicks are kind of efficient sounds: with very little effort, it's easy to make many very distinctive sounds. This is one reason why some click languages have crazily large inventories of distinct sounds. On the other hand, it seems as though it is difficult for a language to acquire them naturally, which is why almost all languages that have them today are related or have been strongly influenced by languages that already had them.

Another trick that pops up to get other air streams than pulmonic ones is to close the glottis and retract or "eject" it - retracting it increases the size of the cavity above it, reducing pressure and causing an ingressive air stream, the other causes an increased pressure and thus an egressive air stream. These kinds of sounds appear in various languages in both Americas, in the Caucasus, parts of Africa. These sometimes combine to some extent with pulmonic air streams, to give us voiced ejective consonants. http://en.wikipedia.org/wiki/Implosive_consonant contain some audio samples.

Consonants, in one sense, are easy to describe: you point at a part of the roof of the mouth, you point at a corresponding part in the lower part of the mouth, you specify some manner of articulation and bingo. Vowels live in a less discrete space.

Most vowels, cross-linguistically are voiced. The vocal cords vibrate during production. No or extremely little turbulence is caused by the articulators. The different vowels' sounds are shaped by how the oral cavity is shaped during production - where the tightest spot is, the length of the tube, etc. There's obviously some possible kind of absolute points to this: the most open configuration we can get at reasonable enough effort, and the greatest closure that doesn't beget any friction. In both cases, we can obtain different lengths of the tube as well, the maximum and minimum (reasonable) lengths behind the maximum closure.

We make a trapezoid with these four points as their corners. (Sometimes, linguists use a triangle instead, depending a bit on the language in question, but the more general case is undoubtedly the trapezoid)

Something like
Code: Select all
i            u
               
             
               
      a     ɒ

The symbols in that trapezoid are IPA symbols, not English letters. [ i ] corresponds pretty well to <ee> in creep, eek, etc. [u] corresponds fairly well to <oo> in woo, [a] is present in <stack> in many dialects of English, and ɒ in Boston, hot, or park in some dialects of English. Now, there's a reason why many languages do in fact have distinct vowels that correspond to at least three of the corners of this trapezoid - to maximize the audible difference between available vowels. Usually, though, reproducing an exact spot in the trapezoid isn't necessary, different languages split up the vowel space in different ways. So some continuous area of the trapezoid ~centered at the two upper corners and some area around the bottom is a very common three-way vowel system, among others shared by Arabic, Quechua, Inuktitut and a bunch of other languages.

Other languages again divide the available space in different ways, but generally the corners are assigned to some sounds. There's some universals as to how the space tends to be divided, and there's some additional things one can do with it - many languages in Europe distinguish vowels depending on whether the vowel is rounded or not. (Generally, back vowels always tend to be rounded, and a distinction based on rounding most often only happens for front vowels.) Some languages distinguish vowels depending on whether there's a nasal air stream as well. There's a bunch of other possible distinctions too.

For some reason, Germanic languages have had a tendency for rich vowel inventories - dividing the vowel space relatively finely, and distinguishing front rounded from front unrounded vowels. English has lost that particular distinction (as has Yiddish and some other western Germanic dialects), but in general they retain rich vowel systems anyway. You will find literature written by people who don't know what they're talking of that says English has five vowels. If the book is about typography, you can accept it as an informed statement (although the symbol y often enough does duty as a vowel to warrant being called one as well). If it's about anything even slightly more linguistic and not related to written language specifically, it's dead wrong.

Now, I've already kind of made a point of it being quite difficult to exactly reproduce a configuration of the mouth - so languages don't have words that correspond to series of exact configurations of mouths, they permit some variety. This might be a good place to introduce the idea of phonemes and allophones for that exact reason. A phoneme is a set of sounds that are perceived as functionally the same, e.g. two words cannot be distinguished by them. These sets are language specific - the fact that English doesn't distinguish kh from k doesn't prevent other languages from considering them distinct. Consider the two words grave and crave. These are very similar, in fact there's only one sound distinguishing them from one another. /g/ and /k/. [khreIv] vs. [kreIv], though, are not considered distinct - we write both as /kreIv/, although the latter may sound slightly off to most speakers of English - the lack of aspiration (belated onset of voicing, and a slight puff of air on the k) cannot distinguish two words from each other - although in some dialects of English, it seems aspirationless [k] is more likely to be parsed as belonging to /g/. For the vowels, a phoneme is basically a stretch of vowel space. Any sound produced anywhere in it will be perceived as the same vowel by a speaker of that language. Consider the vowels in English <beat> and <bit>, you cannot use the same distinction to get two different words in Finnish - they'd be parsed as one word. In contrast, in many dialects of English, the distinction between Finnish / i/ and / i:/would be lost, as most English dialects no longer distinguish length. (note, ":" is used to mark length.) Likewise, Swedish wouldn't contrast the vowel in beat from the one in bit, but English wouldn't permit minimal pairs for all of Swedish /y/, /u/ and /u/, such as this elegant triplet: <mur>, <myr>, <mor>. (Notice: due to historical sound changes, Swedish orthography mostly encodes sounds as follows: <u> = /u/, <y> = /y/, <o> = /u/, although <o> sometimes also encodes what IPA transliterates as /o/).

For convenience, phoneticians have divided up the vowel space along the edges into several roughly equally large steps, and placed "reference vowels" on those spots. There's nothing really special about these reference vowels, except they're a convenient reference point, and it's a fine enough system that it suffices to describe most actual vowel systems in use to sufficient accuracy. There's diacritics for these reference vowels that specify if it's to be opened or closed, retracted or fronted, etc, from its usual place, if we're doing a very strict analysis.

English divides the front edge of the trapezoid into three distinct vowel qualities, i, ɛ and æ. beet, bet, bat. These are called tense vowels for some reason.
Slightly closer to the middle of the trapezoid you get "lax" vowels, ɪ and ɜː. In the middle column you get a central ə pretty much in the middle of the trapezoid and an open ɐ, for historical reasons often transliterated ʌ (a symbol that in the IPA normally goes further back and is more closed). In the back you get, from top to bottom, u, ɔ, ɑ, and slightly closer to the center you get ʊ and ɒ. The reference vowels are phones - they're rather specific sounds. Now, languages can pretty much arbitrarily split up the vowel space, and there's no reason that some language's /o/ or /u/ shouldn't cover multiple reference vowels. In languages with few vowels, the vowels will have large areas of vowel space. This is one reason that Arabic names, despite Arabic only distinguishing i, a and u, often are transliterated with e and o, e.g. Usama / Osama bin Ladin/Laden. Surrounding sounds may affect where there realization goes in the available space, and different dialects may also pick slightly different realizations. To Arabic orthography, there's no difference between e and i; this may sound like Arabs are dumb or stupid and can't distinguish things that obviously are different - but English doesn't distinguish [kh] from [k] - and even has a rule that basically says where a /k/ is to be realized as [k] and where as a [kh], whereas distinguishing them is very natural to every Hindi speaker.

Diphthongs are vowels whose articulation involves movement. Usually, it's sufficient just to specify some kind of direction and rough place of movement, as the exact starting and stopping-points are not that important. English has many of these.

So there's no reason to think that the way one language divides vowel space or consonant space is more natural than the way another language does with the caveat that the corners of vowel space are somewhat special - but the sounds associated with the corners also will have a bit of a range inwards of the trapezoid. For consonant space, a few similar restrictions do tend to be in place: consonants with similar features can be part of one phoneme. What is to be considered similar features varies a bit - can be place or manner of articulation, can be acoustic properties (so in some languages, velar and labial fricatives, for instance, can be allophones of the same phoneme). One common source of allophones are slight secondary articulations, but what in one language is considered an allophonic variation can be a source of a phoneme in another. There may be simple rules governing when to render a consonant so as to have different properties - front vowels pull /k/ forward in English, back vowels keep it where it usually is, back rounded vowels tend to cause a slight lip-rounding, so basically cool is something like [khwu:ɫ], but phonemically /ku:l/, while kill is more like /kɪl// but narrowly transscribed is [c̠ɪɫ] or [k̟ɪɫ] - in the previous version, c is a palatal stop, the diacritic marks that is slightly retracted, the k a velar stop, and the diacritic marks slight advancing - both come pretty close to the same spot. l vs. ɫ in English is determined by whether the l appears before or after vowels, basically. This is also a distinction that in some languages is contrastive, you could have words like ɫip vs. lip or buɫ vs bul. In English, this is not possible, although it can help in figuring out where a word boundary is. Where there's rules like this (or even just random variation), this is called allophonic variation. Phones that can appear as audible realizations of some phoneme are called allophones. This is a natural thing, and pretty much every "functional sound" has such variations. In English, the most obvious ones are the aspiration of voiceless stops in word-initial position, the velarization of l in post-vocalic positions, in British English the realization of t as a glottal stop in some positions, and in American English the realization of t and d intervocalically as a tap instead of a stop. (This is also in part a merger - t and d are not really audibly distinguished in that position in these dialects, and the reason some people hear them as distinctive probably has to do with literacy - the brain expects a correlation to what they're used to from reading. If English was not a written language and someone set out to create an orthography, intervocalic t and d in American English would probably use the same symbol, because they're essentially the same sound. In many varieties, at least.)

c sometimes being /s/ and sometimes being /k/ is not allophonic variation though, as English does not have a phoneme correlating to the letter c. It's an orthographic quirk, whereby a letter sometimes represents one phoneme, sometimes another. The involved phonemes follow their normal rules of allophonic realization.

I feel like I might not have been the most clear here on some things - these are concepts I've sort of known and handled for thirteen years, explaining something like that so it makes sense and isn't misunderstood can be difficult.

The standard symbols used internationally for phonetic transcriptions can be found here - it organizes the sounds along place and manner of articulation, contains diacritics for various secondary articulations, etc. Other phonetic alphabets exist for specific uses, such as the Americanist Phonetic Alphabet, a Slavicist PA and the Uralicist PA. Now, in many languages the IPA symbol closest to some important allophone of some phoneme is often used to transcribe that particular phoneme in some texts, but what symbols are used with what phoneme is often influenced a lot by tradition and orthography. In a more cross-linguistic treatment, a more IPA-based transcription scheme is often selected.