Where did the Austronesian languages come from? People have been talking about that for a few hundred years now - Austronesian linguistics is not an entirely new phenomenon - but the debate has now developed to a high degree of sophistication and nuance. Early European explorers noticed the similarities between Austronesian languages, especially those in Polynesia, at about the same time as British scholars in India were noticing the similarities between the Indo-European languages. While Indo-European is, of course, the more celebrated of the two and by far the best-studied language family on the planet, Austronesian linguistics has been part of the academy for some time, even if only as a minor specialism. It is now one of the most securely identified language families and Urheimats that we know of.
James Cook noticed that all of the Polynesian languages he encountered resembled one another in startling ways. He also saw that the languages of Indonesia resembled those of Polynesia. The relationship is actually very clear, even between languages separated by thousands of kilometres, so it wasn't outrageous speculation to propose this set of relationships. The word lima, for instance, is used in some form to mean 'five' in languages from Madagascar and Taiwan to Rapa Nui and Aotearoa. It is found in every sub-family of Austronesian - Malayo-Polynesian (including Central (CMP), Western (WMP), and Oceanic), Atayalic, and so on.
Words for the ordinal number 5 in some Austronesian languages:-
Paiwan (southern Taiwan): lima
Seediq (central-eastern Taiwan): rima
Hawaiian (northern Pacific): 'elima
Fijian (south Pacific): lima
Samoan (south Pacific): lima
Uab Meto (West Timor, eastern Indonesia): nim
Words like this can be found throughout the lexicons of Austronesian languages - words that resemble one another so closely that the relationships between the languages are immediately obvious, even to non-linguists. So we know that the Austronesian languages are closely related, and that they therefore descend from a single common language. This single common language is called Proto-Austronesian (or Proto-AN). So the question is, where was Proto-AN spoken?
Let's look at language first. If we compare the Malay and Filipino languages - just these two - we can see that Malay grammar is considerably simpler than Filipino. Filipino has a full set of cases. It's not a nominative-accusative language like Latin, but an ergative-absolutive language (actually, it has a system all of its own, but it's closest to ergativity). It shows a clearly agglutinative morphology, meaning that new words can be made by sticking other words together, as when you put '-ness' or '-ful' or a similar affix on the end of a word. Filipino is an agglutinative language, meaning that you can make new words by putting existing words together, and you modify the meaning of those words using affixes.
Malay, by contrast, has no tenses, let alone cases. It's much simpler. And, as all students of Malay will tell you, it has these annoying relics - verb forms that modify meaning ever-so-slightly, like meng- and -kan. You add these to verbs to add a little nuance, and they are clearly related to those used in Filipino, they're just much simpler - in fact, they're simplified forms with most of the meaning excised. They're vestigial affixes from the inflecting system of a language similar to Filipino.
kenal - to know (somebody)
memperkenalkan - to introduce (somebody)
Compare with Filipino affixes, like nag-, mag-, ipan-, mang-, etc.
Malay is also a mostly isolating language, unlike Filipino. You can't just stick Malay words together to make new ones. Malay is like English; you take units of meaning, or words, and put them together in sentences without changing them too much:
Saya tidak punya mobil yang merah.
I - not - have - car - which - red.
'I don't have a red car.'
Isolating languages tend to come from inflecting languages through loss of the inflections (and also the use of much stricter word order). For good examples, look at the etymology of most French or Italian words, some of which are, in essence, Latin words without the endings.
We may then infer that Malay isn't just a simpler relative of Filipino - it's clearly descended from a language much like Filipino. Filipino is spoken, of course, in the Philippines, and its closest relatives (the Philippine languages, oddly enough) are spoken in the Philippines as well. Importantly, the agglutinating morphology and ergative-absolutive verb forms are also found in Taiwan, even further to the north. The native languages of Taiwan also have this complex system, the one Malay descended from. So Malay and other languages like it almost certainly come from a single language that was spoken somewhere in the Philippines or possibly Taiwan.
There's more. As a general principle, it is a good idea to look for an origin place in a region which has the highest diversity of whatever it is you're looking for. The centre of the greatest diversity, all things being equal, will be the source.* Africa is the most diverse continent on earth in terms of human genetics; it is also the continent on which humans originated. And Taiwan has the greatest diversity of Austronesian languages, meaning that it is probably their source. Outside of Taiwan, there is only one sub-family of Austronesian (sub-families are like 'Germanic' and 'Tibeto-Burman', as opposed to the highest known classifications, like Indo-European and Sino-Tibetan) called Malayo-Polynesian. It is further sub-divided into a bunch of sub-sub-families, but only in Taiwan are there any other sub-families at a similar level of classification as Malayo-Polynesian. These include Atayalic, Paiwanic, and others. Robert Blust, perhaps the most famous Austronesian linguist of recent times, has proposed that there are nine sub-families of Austronesian present on Taiwan and only one found off the island, Malayo-Polynesian (MP). So Taiwan has the greatest diversity of languages and it has languages with the right syntax and morphology to give rise to other Austronesian languages. It seems like a good bet as the place of origin of the Austronesian languages.
What about genetics? Are there any genetic markers - haplogroups - that correlate with the spread of Austronesian languages that could give an indication as to their origin? As it happens, there are. Stephen Oppenheimer, a slightly loopy medical doctor at Oxford, has claimed that Austronesian must have originated in the Moluccas (Maluku, in eastern Indonesia), because he claims that genetic findings support this. In fact, they don't. Several kinds of genetic studies point to a Taiwanese origin. One of the most interesting comes from a human-specific parasite. If you've had a stomach ulcer at any point, or come close to having one, then you may be familiar with Helicobacter pylori, a bacterial parasite specific to Homo sapiens sapiens. It is carried wherever humans go, and studies of the genes of Helicobacter from humans across the Austronesian-speaking world point to Taiwan. That's where the strains of Helicobacter pylori in Austronesian-speaking populations seem to have come from.
There are also studies of the more typical non-recombining Y-chromosomal DNA and mitochondrial DNA (NRY and mtDNA respectively) that indicate a Taiwanese origin. NRY DNA is DNA passed through the male line, and mtDNA is passed through the female line, so correspondences between them constitute pretty good evidence. The B4a1a mtDNA haplogroup is shared among Taiwanese aborigines, Melanesians, and Polynesians, as well as many Indonesian groups. The major NRY haplogroup in island Melanesia, O-M110, also seems to have originated in Taiwan. So both principal lines are represented by haplogroups that appear to have come from Taiwan. Furthermore, 890 autosomal (non-sex-chromosomal) DNA markers are shared between Polynesians, Melanesians, and Taiwanese aboriginal peoples. DNA from human-specific parasites, from both sex chromosomes of Austronesian-speaking individuals, and from the autosomes of those individuals all point to Taiwan as the origin of an Austronesian-speaking population that spread across Indonesia, the Pacific, and the Indian Ocean from an origin point in Taiwan. And that also backs up the linguistic data, which is convincing.
In the case of the Austronesian languages, the speakers of the proto-language also seem to have spread their genetic material as well, which is pretty helpful to archaeologists. It wasn't that Austronesian language and DNA wiped out non-Austronesian languages in the Philippines and Indonesia, and some clearly very ancient haplogroups are recorded from such places as Timor and Flores (including two mtDNA haplogroups, M42 and N12, found only among indigenous populations in Australia and two islands of eastern Indonesia, Adonara and Flores). But a sufficient number of genes from an ancestral Austronesian-speaking population were transferred to populations throughout what is now the Austronesian-speaking world as to reinforce the proposed Taiwanese origin of the language family.
Now that we know that Taiwan is where the Austronesian languages come from, as well as a significant proportion of the DNA of many Austronesian-speaking groups (but certainly not all of them), we can get on to the issue of when this all happened. The answer archaeologists give is that Austronesian speakers started to leave Taiwan about 5,000 years ago - between 3500 and 3000 BCE. This correlates with the spread of certain types of farming (but definitely not the outright introduction of agriculture) in island southeast Asia, as well as certain pottery styles and the other bits and pieces of archaeological cultures that we can use to find out more about the past. Of course, you can never identify language from pottery remains unless written language is present, but there is a migration that correlates with the spread of Austronesian languages from Taiwan in strata and with radiocarbon dates that point to Austronesian languages leaving the island around 3000 BCE.
Genetics, linguistics, and archaeology all point to an emergence of Proto-Austronesian in Taiwan (or, possibly, southeastern China) around 5,500 years ago, in the Neolithic. Its speakers had sailing canoes, grew plants and raised animals for food, and almost certainly practiced headhunting. They lived in tribal societies whose precise structure is impossible to divine, but they appear to have believed strongly in the power of origin places, precedence, and the narration of genealogy using a geographical metaphor, using a poetic narrative of placenames to simultaneously narrate historical and mythological events (this is found throughout the Austronesian world). They probably phrased their 'descent' from common ancestors in terms of ascent, using a biological metaphor - clans are believed to grow like trees and plants in many Austronesian idioms. They probably also believed in ancestral spirits and thought it best to placate them. These spirits may have been called *nitu, and they may have had something to do with both social structure and headhunting (belief in ancestral spirits has an obvious role in bolstering the power of clans and lineages). There may also have been a preference for giving property and power to eldest sons, leading to the splitting of the clan and the expansion over territory of the language/culture as younger sons sought to realise their ambitions. These are all features found in modern Austronesian-speaking populations or those encountered historically, and accord with both the linguistic and archaeological records.
Austronesian origins is a popular topic today among speakers of Austronesian languages, just as the study of Indo-European origins is popular among speakers of those languages. Archaeologists, linguists, geneticists, and socio-cultural anthropologists now have tools that can answer the questions Austronesian speakers have about their past and origins, and while such ethnic origin accounts can always be turned to political purposes (a la Hitler), it's still great to have those answers.
Here's a map, stolen from French Wikipedia [user: Maulucioni], illustrating the approximate times of the expansion of Austronesian languages (the Surinam, USA, and Middle East dates are mostly to do with the spread of Javanese, Samoan, Hawaiian, and other common languages of the Austronesian-speaking diaspora):
The green arrow with a question mark pointing towards South America indicates the possibility that some Polynesians made it to South America. This is demonstrated by the presence of the sweet potato (Ipomoea batatas) in Polynesia before the arrival of Europeans in either area (radiocarbon dates place the sweet potato in Hawai'i from the thirteenth century - I'm not sure about the rest of Polynesia). South American civilisations do not appear to have been sea-faring in any big way (they lacked sails, for instance), while Polynesian navigators were used to travelling vast distances on their ocean-going sailing canoes. Sweet potatoes float, but before long they absorb the water and sink (trust me!), so the most reasonable interpretation is that some Polynesian navigators reached South America before long-distance voyaging became unfashionable. There's no direct evidence of it, but it must have happened, or else we're at a loss to explain Ipomoea batatas in Hawai'i, Rapa Nui, and New Zealand.
As for where the proto-Austronesian language came from - before it arrived in Taiwan, that is - there are reasonably convincing proposals linking Austronesian to the Tai-Kadai family, of which Thai and Lao are the most famous extant languages. It is also often taken to include the Austroasiatic languages - Khmer and Vietnamese are the most famous of those (see my post on Austroasiatic here). I was unconvinced of this possibility until fairly recently. Most of the early papers' arguments depended on long series of sound changes - up to six separate sound changes - and used very small numbers of vocabulary items, which made the claims a little spurious. The time-depth would be so great that this number of sound changes is reasonable, but of course, such a large number makes proof of the relationship impossible. More recent analyses depend on less extreme changes and have used data from previously unstudied languages in their claims. The Wikipedia page gives a good breakdown of a number of the Austro-Tai proposals (the name given to the proposed family containing both Austronesian and Tai-Kadai). It seems prima facie unlikely that languages that must have diverged at such an extreme time-depth would show any signs of being related, but it seems that there is a little evidence pointing in that direction. I wouldn't want to reject such a tantalising possibility out of hand. Most scholars, including Robert Blust, place proto-Austro-Tai in the Chang Jiang (Yangtze) valley at the time of the domestication of rice in the region, thousands of years ago.
For a discussion of the motives of the Austronesian expansion, see my post here.
*This isn't true with the Indo-European languages, as successive migrations have done away with the diversity of languages once spoken at the source of Indo-European. But it's a good general principle.