The Return of Darwinius masillae

The Four Stone Hearth is in need of hosts for 12/22/10 and beyond. Please consider hosting.

The team that brought us Darwinius masillae has a new paper out defending the haplorhine status Darwinius masillae. The paper is a response to Williams et al (you can read about that paper here). The paper raises an issue that I have blogged about in a previous post but the Gingerich et al paper is a good example of what Rudolf Raff was talking about.

Allow me to quote part of what I said in the post linked to above:

Rudolf Raff, in The Shape of Life, has an interesting discussion on attempts to iron out the relationships between lungfish, trout, and humans. On the surface it is quite simple. Lungfish are more closely related to humans than trout are. The lungfish has some adaptations to air breathing and one of the questions raised by the above relationship is whether these adaptations are homologous to those of tetrapods, or are they independent solutions to the same problem.

Enter the coelacanth. Morphological analyses of where the coelacanth fit into the above scheme yielded conflicting results as did various and sundry molecular analyses. One of the keys to solving the problem came in a mitochondrial DNA analysis that indicated a lungfish-tetrapod clade with coelacanths as the next branch and finally ray-finned fish. Assuming this is true, what can the fossil record and morphology tell us? This is where the story gets interesting. According to Raff, the lungfish, the coelacanth, and tetrapods are the few surviving members of a, once, more diverse rhipidistian clade. Early lungfish were deep sea forms that had gills, while modern forms are air breathers. Getting back to the question above, this means that the adaptations to air breathing are convergent with tetrapods. Raff concludes:

An especially striking demonstration of this conclusion is that the earliest know tetrapod, Acanthostega from the upper Devonian of Greenland has been shown by Coates and Clack to have had functional internal gills. It probably also possessed lungs, which were a primitive feature shared by bony fishes. Tetrapods have lost their gills in becoming more terrestrial. The first tetrapods thus convergently resembled modern lungfishes more than they resembled the earliest lungfishes. [page 162 – afarensis]

The molecular data wasn’t wrong, just incomplete due to missing taxa. In this case the taxa were missing due to extinction and this problem also, one thinks, affected the morphological analyses. I suspect that one could achieve the same affect by simply omitting some species from a morphological analysis. The point to take away from this is that in order to untangle the problem, both molecules and morphology were required. Not to mention more data. All to often, morphology and molecular analysis have been presented as being in some kind of zero sum conflict where there can be only one.

Most of the paper will sound a little familiar since it is basically a rehash of the PLoS paper, however, there are several new bits that I would like to look at. The first comes in the section called “total evidence”. The total evidence approach uses all evidence, be it skeletal, molecular, or soft tissue to create the phylogenetic tree (something Gingerich et al’s analysis does not do). Gingerich et al then go on to state:

Comparisons of phylogenetic trees and comparisons of branch lengths and character distributions in a phylogeny are statistical, and both depend on a balanced representation of taxa and characters.

First I’ve heard, but I’m a relative novice when it comes to phylogenetic analysis. Gingerich et al then discuss many-taxa versus few-taxa matrices and then is where the paper really goes off the rails. They level several criticisms against their critics, first:

We agree with Seiffert et al. (2009), Williams et al. (2010), and others that there is a strepsirrhine-haplorhine dichotomy in primate evolution. We employ the same cladistic methods. We accept that total evidence drawn from many sources is advantageous. Why then do we reach such a different conclusion about the systematic position of Darwinius?
Given that our methods are the same, then our contrasting results can only be explained by differences in the number and balance of taxa chosen for study, the character matrix used to analyze higher-level primate phylogeny, the outgroup chosen to root a phylogenetic network, or some combination of these.
Kay et al. (2004) scored 144 characters for 63 taxa; Bajpai et al. (2008) scored 343 characters for 75 taxa; Seiffert et al. (2009) scored 360 characters for 117 taxa; and we scored 30 characters for 8 taxa. Is a bigger matrix a better matrix? What are the costs and benefits of many-taxa representation? Does adding characters compromise independence? Does adding taxa compromise computation? (bolding mine – afarensis)

There has been an abundant amount of research both in systematic biology and in paleontology, on these questions and it is kind of stunning that this literature was not referenced. For example, trees drawn from a small number of taxa (especially if several of them are evolving quickly) can be subject to long branch attraction. Long branch attraction is usually broken up by the addition of more taxa. Research has shown that the inclusion of incomplete taxa (that is taxa where one set of data such as soft tissue characters, or molecular characters are missing) and fossils where not all characters of interest can be coded can break up long branch attraction. A little further into the paper Gingerich et al say:

This result indicates that what matters is not the number of characters, the choice and independence of characters, the outgroup, or who did the scoring. The critical factor seems to be the number of taxa and the representation of characters (or missing data) in the taxa studied.

Later they say:

Maybe the many-taxa problem is related to missing data? Some 11,949 of 25,725 cells (46%) in the Bajpai et al. (2008) matrix are
empty, and 22,260 of 42,120 cells (53%) in the Seiffert et al. (2009) matrix are empty. Maybe over-representation of some characters interacts with underrepresentation of others as missing data, with unknown statistical effects?

Missing data has been an intense are of study as will. For example Weins (2003) studied the issue via simulation and concluded:

The results of the present study show that the proportion of missing data cells in the incomplete taxa is a poor predictor of their impact on phylogenetic accuracy. A much better predictor is the number of characters that can be scored in the incomplete taxa. The overall accuracy for trees that include incomplete taxa seems to be closely related to accuracy based only on the characters that can be scored in all taxa …

Literature Cited/Recommended Reading

Cobbett et al (2007) Fossils Impact as Hard as Living Taxa in Parsimony Analyses of Morphology, Sys. Biol. 56(5): 753-766

Felsenstein (1978) Cases in which Parsimony or Compatibility Methods will be Positively Misleading, Sys. Biol. 27 (4): 401-410

Gauthier et al 1988 Amniote Phylogeny and the Importance of Fossils, Cladistics 4: 105-209

Gingerich et al (2010) Darwinius masillae is a Haplorhine – Reply to Williams et al (2010) Journal of Human Evolution 59: 574-579

Heath et al (2008) Taxon sampling and the accuracy of phylogenetic analyses Journal of Systematics and Evolution, 46(3): 239-257

Kearney and Clark (2003) Problems due to missing data in phylogenetic analysis including fossils: A Critical Review, Journal of Vertebrate Paleontology 23(2): 263-274

Smith and Turner (2005) Morphology’s Role in Phylogeny Reconstruction: Perspectives From Paleontology, Sys. Biol 54(1): 166-173

Wiens (2003) Missing Data, Incomplete Taxa, and Phylogenetic Accuracy, Sys. Biol. 52(4): 528-538

Wiens (2003) Incomplete Taxa, Incomplete Characters, and Phylogenetic Accuracy: Is There a Missing Data Problem

Wiens (2005) Can Incomplete Taxa Rescue Phylogentic Analses from Long Branch Attraction? 54(5): 731-742

Wiens (2006) Missing data and the design of phylogenetic analyses, Journal of Biomedical Informatics 39: 34-42

Wiens and Reeder (1995) Combining Data Sets with Different Numbers of Taxa for Phylogenetic Analysis, Systematic Biology 44(4): 548-558

Zwickl and Hillis (2002) Increased taxon sampling greatly reduces phylogenetic error, Sys. Biol. 51(4): 588-598

3 Responses

  1. I just can’t stop wondering how this paper actually got published in the first place. Shouldn’t this whole peer-review stuff prevent people from publishing such flawed studies?

  2. I suspect it has something to do with the fact that the Williams et al paper was published in the same journal – right of reply and whatnot,

Comments are closed.

%d bloggers like this: