Posted: Sep 14, 2012 10:43 pm
by Zwaarddijk

I previously explained the changes that can happen in languages, but I never explained anything about methodologies in studying these phenomena after they've occured.

Back in the day, scholars were noticing that there were similarities between Latin and Greek, and eventually, the increased contacts to India got scholars of classical languages interested in Sanskrit and the languages of India. Turned out there were some relatively regular correspondences between the three of them. I am not entirely sure what order similarities were being discovered here, but regular similarities also were found to Old Church Slavonic, Avestan (Persian), Gothic, ...

And due to the regular nature of most sound changes, a relatively clever idea appeared. What if, given a set of languages derived out of a prior, unattested language, it's possible to, to some extent, reconstruct the ancestor language? For some reason, the method used is called 'the comparative method'.

Many scholars worked over decades on this, new languages were found to belong to the set: the Celtic languages, various archeologically discovered ones, ... and the internal groups were also reconstructed: the evidence from the Germanic languages gives a pretty good reconstruction of proto-Germanic, the Baltic ones gives an ok reconstruction of proto-Baltic, those and the Slavonic languages give a good reconstruction of a possible Balto-Slavic ancestor node, ...

Now, an interesting thing happened towards the end of the 19th century: some scholars were finding that a bunch of words had changed in relatively consistent ways, but for many languages, it was like there had been a flip of a coin for words which particular type of change they would go through - and the flip of a coin seemed to have landed similarly everywhere, despite there not being any sound present that would've triggered it in one word and not in others. de Saussure posited that there had been sounds that at some point had triggered some sound changes (so, e.g. sound 1 had triggered a certain set of changes, sound 2 had triggered another set), and then had been lost in all descendant languages. That, of course, is kind of unfalsifiable, isn't it? The theory was that the sounds probably were somewhere far back in the mouth, alternatively in the throat, and that they likely were fricatives. These have later been labelled h1, h2, h3.

And yes, as far as theories go, it is unfalsifiable. But luck would have it, that a civilization suddenly was dug up somewhere in Turkey. The Hittites. And in their texts - which were written using a known script - that of Akkadian. To scholars, it was immediately obvious that this language too was cognate to the other indo-european languages, due to several cognates (words shared by the different languages, though with potentially different sound changes along the line. English 'have' and German 'haben' are an example of cognates.) It turned out that some letters that in Akkadian were used for throaty sounds were used in Hittite with the exact same distribution as de Saussure had predicted.

Later discoveries of languages have also sometimes been languages for which regular changes have been easy to posit from reconstructions of proto-Indo-European, altho' new discoveries sometimes also have shed light on uncertain things about it. The fact that Tocharian also clearly was of Indo-European derivation supports the comparative method

Now, archeology, philology, genetics and other scholarly fields have traced the historical movements of the Indo-European tribes and their languages. Of course, genetics does not tell what language one speaks - the Afro-American community in the US should be sufficient evidence that a group of people can change language to one they have not inherited from their parents.
However, these tools do help us figure things out about prehistorical tribal movements.

Of course, not all languages in the world can be traced to Indo-European. About as early as the idea that there may be an actual explanation as to why Greek, Latin and Sanskrit are similar, similar theses were being proposed for Finnish, Sami and Hungarian , sometimes also including small languages spoken in different parts of Russia. Turkish has a number of central and north Asian relatives with well studied similarities. Manchurian, has a number of relatives called the Tungusic languages. Mongolian, likewise, has a number of relatives that linguists are very sure of - in all these language families, there are a significant number of shared words where the sound changes are very regular.

(Note: typology is the study of features of languages, as in, does a language tend to use prefixes or suffixes, does a language use subject-verb-object or subject-object-verb or some other or no dominant word order, does it have a gender system, does it have a case system, ... and various increasingly abstract or complex features.)

Not all suggested relations have held true either: Hungarian linguists have often tried linking it to Turkish, and strenuously denied any relation to Finnish - but the cognates generally have only gone through recent sound changes in Hungarian, and none of the earlier layers of it, and can therefore be excluded from being cognates, and instead, with high certainty, be assumed to be loans.
Some linguists have found the typological similarities between the Turkish languages, the Mongolian languages, the Tungusic languages, (and sometimes Korean as well as Japanese (and the closely related Japonic languages)) to be too great not to be related. The theory that these form a family is often called the Altaic family. No cognates shared by all three have been found, though - but cognates shared by any two of them do exist, which might instead suggest that intense loaning has occured between them from early on, but that the shared words are not indicative of actually sharing an original proto-language.

The Uralic languages have also been associated with Altaic at times on account of purely typological similarities, but few linguists accept such a grouping nowadays.
Attempts have also been made to link Uralic and Indo-European: the two families have had contact for a very long time, which we know in part due to purely archeological evidence. Another type of evidence for early contact are words of indo-european origin that must have been borrowed early by the uralic tribes, as they had gone through just a few of the historical sound changes that were to happen in the IE tribes they were borrowed from, the words then going through several changes in the Uralic languages - such things can actually help us put an approximate date on when the word was borrowed as well as on when the sound changes happened (if supported by archeology or other knowledge). Examples of such borrowings are Finnish "orja" from a word that ended up as English "aryan", (funny enough, it's changed meaning in Finnish to 'slave'), and "porsas" from the word that ended up as "pork" in English. (From an original *porćas). These words can be shown not to go back to proto-finno-ugric or proto-uralic though. Suggested cognates include water - vettä, name - nimi. ... e_cognates contains a larger list of proposed cognates.
Fortescue has shown that Uralic and Yukaghir (a small family of Sibirian languages) are related, and his work showing that Uralo-Yukaghir and Eskimo are related is apparently considered rather good, but is recent enough not to have been accepted everywhere.

The reason such a complex methodology is used has to do with the overwhelming likelihood of random matches in large enough corpora. Often, claims of languages being related can be made for nationalistic, religious or other nutty reasons. Such claims seldom come with any method, but are just large sets of words that look similar but have no systematic similarities, no regular sound changes involved. The 'research' seldom takes into account the fact that languages loan words as well. Paarsurrey's claims of all major languages being derived from Arabic is typical of that kind of thinking. is an article that builds a small model for estimating the chance that such false cognates can be found - turns out it's overwhelmingly likely if no methodology is permitted.

I will list some of the main language families of the world below. This dry boring list does tell us something: it tells us how languages have spread. Keep in mind this is not necessarily the same as how genes have spread, as a group can adopt a new language, and this is known to have occured at times. (E.g. the Finns and the Samis have quite distinct genes, yet both groups speak related languages. Another, more modern example would be immigrants to the Americas in general.) However, adopting a language is no small change, and oftentimes, groups that speak related languages will be somewhat related.

Now, some other families that are of relevance in the world include, and this will be rather boring reading:
Afro-Asiatic (formerly known as Hamito-Semitic), include Hebrew, Arabic, Maltese, Tigrinya, Aramaic, Coptic, and most languages of north Africa and Arabia. These languages, like Indo-European, have written sources that go rather far back, which has given considerable aid in reconstructing the proto-language and in figuring out what changes have happened, historically, to them. Some of the grammatical properties shared by pretty much all the Semitic languages (and to a lesser extent the Hamitic ones) influenced some attempts at solving some problems of reconstructing Indo-European, as it was assumed they might be similar or even related further down the line. No solid evidence of Indo-European and Afro-Asiatic sharing any considerable set of cognates has been found, though - mostly words for new technologies that have spread from semitic areas to indo-european areas or vice versa.

Sino-Tibetan include Chinese, Burmese, Tibetan and a number of minor languages in that general area. These too have written sources that go far back that have helped in reconstructing the ancestral language, and are typologically quite distinct from Indo-European and northern Asian languages.

Dravidic languages are spoken mainly in the southern parts of India. They tend, typologically, to be agglutinative - just like Mongolian, Uralic, Turkic, Tungusic and any number of other families, where long strings of suffixes or prefixes are permitted. Some of them have clear traces of Indo-European influence, and some Indo-European languages of India have clear Dravidian influences.

(I have omitted some Asian families here, as these are families I don't know much about - such as the Thai and Vietnamese language, I barely know anything about those. Neither will I go into detail on the families of the Australian aboriginal languages - languages that often do have fascinating grammars, but where certain taboos often have obscured relations between the languages quickly. A taboo common in that part of the world has been avoidance of uttering any word that sounds like the name of anyone recently dead, so the tribe elders apparently in some of the tribes planned ahead what to call things whose names are too similar to John, Jill or Steve once John, Jill or Steve dies. Such a situation naturally erodes the similarity in the vocabulary between related languages quickly, and makes reliable reconstruction impossible. )

Niger-Congo languages are spoken in most of sub-saharan Africa, and include Swahili. Alas, reconstruction has not been reliably carried out for the entire grouping, although some subgroups are fairly solid. Khoe languages (including Khoisan) form the largest group of non-Niger-Congo languages in subsaharan Africa. On Madagascar, an Austronesian language - related to Maori, Hawaiian, Rapa Nui (Easter island), and the languages of a large part of Indonesia, Polynesia, etc is spoken. The Austronesian languages are now believed to have spread out from Taiwan. Papua has not been much settled by Austronesians, though, and contains a multitude of small and relatively unresearched language families.

The Caucasus mountains contain several tiny families, including the Kartvelian languages, and two families just named Northeast and Northwest Caucasian. In addition, Indo-European and Turkic languages are spoken in the area. Attempts to link the families together with each other, or with Afro-Asiatic or Indo-European have been tried. No success there.

Language isolates are languages for which no related languages are known. These include Basque, Burushaski, Ainu, a number of Siberian languages, Korean, Japanese (sort of, there are languages known to be closely related to Japanese, spoken by at most tens of thousand speakers. These are often considered dialects in various sources. The Japonic family, though, is not known to be related to any other languages). There are also isolates in the Americas and Africa, but I don't recall any specific ones right now. We also have attested extinct languages for which we know of no related language: Sumerian, Hattic, Elamite, possibly Etruscan.

The native American languages form several separate families, and among them, there also are a large number of isolates. This may be a result of these languages not having been well enough studied, though.