Showing posts with label Systematics. Show all posts
Showing posts with label Systematics. Show all posts

Wednesday, 15 August 2012

A method for subsetting FASTA files

I got back my first sequences for various Irenimus specimens this past week, and have created nice, clean contigs from the forward and reverse sequences. I've done this using FinchTV and Seaview, saving the results as a FASTA file with all of forward, reverse and consensus sequences for each specimen. Saving the data in this format has the benefit of being suitable for tracking through version control software, which means that every change I make to the file can be recalled. I'm only using one file for creating the contigs, but I'm using three gene regions, which will then need to be aligned with each other in the future. Thus, I need to have a method for subsetting my master document into smaller files with only those sequences from the same gene regions.

To do this, I have come up with a convention for naming the sequences I wish to use down the line:

>geneRegion|specimenNumber|speciesCode|otherInformation
From here, all sequences from a certain gene region can be retrieved using a little piece of awk magic. For example, all sequences from the 28S ribosomal RNA region (i.e. those starting with the line >28S|....) can be obtained by running the following code in the terminal:
awk '/>/{p=0};/>28S/{p=1} p' raw_sequences > 28S.fasta
A big thanks to backreference.org for pointing out how this might be achieved.

Wednesday, 25 July 2012

Systematics of South Pacific sap beetles

Carpophilus maculatus, Carpophilus cheesmani and Carpophilus oculatus
The species of Carpophilus of particular interest: dorsal habitus above, male parameres below.

Two and a half years ago, I completed my MSc looking at the sap beetles in the genus Carpophilus. In particular, I looked at the C. oculatus species complex from the South Pacific. The species was first described in 1864, before it settled into obscurity. It was only mentioned a few times in the literature until 1993, when Ron Dobson published the results of a study where he looked at a large series of the species. He described three subspecies, two of which were widespread and sympatric, while the third was confined to Vanuatu. Another species, C. maculatus is rather similar in appearance, to the extent that questions were being raised as to the validity of the taxon complex.

My task was to look at this group using molecular methods. In particular, I used three genes to investigate the relationships between these four taxa, and any other species of Carpophilus I could get my grubby hands on. I found that C. maculatus is indeed a distinct species from C. oculatus, and also found sufficient evidence to warrant raising the subspecies from Vanuatu to a full species. The other two subspecies, while being somewhat distinct, did not form entirely separate groups, which suggests that something interesting has happened in the genetic history of these taxa. It was a successful and enjoyable project, and I am proud to say that I completed my MSc with first class honours.

So far, so good. However, the currency of modern academia is peer-reviewed publications. The preparation of manuscripts is an arduous process, and over the past two years the one describing the aforementioned research has been languishing on various people's desks (mine, mainly). In the past month though, it been brought into the light of day and has been published in Molecular Phylogenetics and Evolution. Check it out! If you don't have access to it, feel free to email the author.

References:
Brown SDJ, Armstrong KF, Cruickshank RH. 2012. Molecular phylogenetics of a South Pacific sap beetle species complex (Carpophilus spp., Coleoptera: Nitidulidae). Molecular Phylogenetics and Evolution, 64(3), 428–440

Tuesday, 13 December 2011

spider: an R package for species identity and evolution


spider: Species identity and evolution is an R package developed by the Lincoln University molecular ecology lab group to do a range of analyses that various lab members wanted to run that were not yet implemented in R. In particular, the package provides functions for conducting sliding window analyses on DNA sequences, the calculation of identification efficacy of a library of reference DNA sequences, and the segregation of distance matrices into their inter- and intra-specific components.

The above are the main attractions, and the ones that we tend to write about when promoting it in places like the 4th International Barcode of Life Conference. There's a bunch of other neat utilities in there also though. A couple of the ones that I particularly enjoy are tiporder(), which returns the tip labels in the order in which they appear on the tree; paa() which conducts population aggregate analysis on a dataset; and rosenberg() which calculates Rosenberg's probability of monophyly for the nodes on a tree.

Spider is available on CRAN, and R-Forge, the latter providing opportunities to report bugs and to collaborate in the future development of the package should you desire to do so.

Tuesday, 16 August 2011

Book review: "Every Living Thing" by Rob Dunn


The subtitle of this book, "Man's Obsessive Quest to Catalog Life, from Nanobacteria to New Monkeys" caught my eye as I was looking through a book sale table. Being one of those who desires to contribute to this quest, I was delighted to find it. The book is an enjoyable overview of selected personalities whose lives and work define (for the author) the growth of our knowledge of biodiversity. Their stories are told with understanding and humour.

Starting with indigenous knowledge of biota, he introduces us to Linnaeus and Leeunwenhoek before describing modern scientists whose work has increased our appreciation for the diversity of life and expanded our understanding of its limits. The journey described is one that progresses from a focus on the species with greatest impact on daily life, to an understanding that "the rest of life does not revolve around us, nor is it like us (p. 247)". Comparing this discovery to the Copernican revolution, he argues that there remains the need for humility in assessing our knowledge and acheivements in discovering the natural world.

A major theme of the book is the obsessiveness that drives the scientists who are described. Being one who shares a similar outlook, I can sympathise with the men and women described. Indeed, I find myself wishing I could be (to a certain extent) in their shoes. However, I don't know if someone who doesn't have the same drive and desires would find the portrayals heroic or pathetic. As the author describes,
"If systematists are socializing, it means, to many of them, simply time they are not looking at the organisms they really love. The obscurity of the things on which taxonomists work does not lessen their focus. In fact, it may heighten it. To dig into their subject, they have to dig so far in, focus so intensely, that the rest of the world seems farther and farther away." (p. 101)
Balance is important, and many of the best taxonomists I've met understand that. But it is hard, when there's so many fascinating and beautiful creatures out there not to succumb to the temptation.

In summary, "Every Living Thing" is an accessible and enjoyable book that tells the story of a few of the personalities who have contributed to the classification and discovery of the organisms we share this world with.


Dunn, R. 2009. Every Living Thing. Man's Obsessive Quest to Catalog Life, from Nanobacteria to New Monkeys. HarperCollins, New York.

Friday, 14 January 2011

Changing phylogeny tip labels in R

During the process of molecular systematic research, specimens are given code names and numbers to keep track of data through the pipeline. These can contain a lot of information of relevance to the researcher, but unfortunately are meaningless to others who aren't as involved with the data. On publication, it is necessary to change the names from the code to a label that is more widely understood. This process can be tedious and fiddly, particularly when it needs to be done multiple times.

The following is a simple R-based solution for changing the tip labels of phylogenetic trees. First, we need to create a tree and a dataframe containing both the specimen codes and the ultimate labels.
library(ape)
tr <- rtree(5)
d1 <- c("t1","t2","t3","t4","t5")
d2 <- c( "paste(italic('Aus bus'), ' top')", "paste(italic('Aus bus'), ' bottom')", "paste(italic('Aus cus'), ' middle')", "paste(italic('Aus cus'), ' north')", "paste(italic('Dus gus'), ' south')" )
d <- as.data.frame(cbind(label=d1, nlabel=d2))

The code in the nlabel column contains code defining a plottable expression that enables scientific names to be formatted as italics. In my work, I saved this table as a separate file which I call with read.table("file.txt", header=TRUE, sep="\t", stringsAsFactors=FALSE, quote=""). The quote argument is important as it carries the nested quotes through into the dataframe properly.

The business of actually changing the tip labels is done with the following lines:
tr$tip.label<-d[[2]][match(tr$tip.label, d[[1]])]
tr$tip.label<-sapply(tr$tip.label, function(x) parse(text=x))

The first line enters the expressions for the new labels in the correct order. The second line converts the character string into a printable expression.

Plot the tree and voila!

Sunday, 15 August 2010

Downloading DNA sequences into R

A while ago, a friend of mine needed to download a number of different DNA sequences from Genbank, the online repository for the vast majority of DNA sequences read from all organisms by labs all over the world. This is not a problem. The "ape" package in R has a nifty function, read.GenBank(), that downloads the sequences identified by the accession numbers given to the function into a DNAbin object. Thus, read.GenBank("AY883003") downloads the sequence AY8833003, the internal transcribed spacer 2 gene for Anthonomus grandis, the cotton boll weevil. read.GenBank() is able to read a vector of accession numbers, making easy to download a lot of sequences if you're willing to give it the time.

All well and good. Unfortunately, the base function returns only the accession number as the name of the sequence. My friend was downloading sequences of many different genes from several different species. Understandably, mere accession numbers are not particularly helpful in this situation, and more information is helpful for processing datasets such as this. Thankfully, a quick hack of the function ensured that species and gene region info could be downloaded with the sequences, solving the problem. It also extended the function's utility significantly and in my opinion is now much more useful for phylogenetics-type work.

The resulting function is read.GB(). It currently reads the "ORGANISM", "DEFINITION", and "ACCESSION" fields of Genbank files which record the information regarding species identity, gene region and accession number respectively. These are stored in the resulting DNAbin object as an attribute, and can be returned in the following manner:

a<-read.GB("AY883003")
attr(a, "species")
attr(a, "gene")
attr(a, "accession_num")

The current default names for the sequences are returned in a standard format: accession number|scientific name.

Full credit goes to Emmanuel Paradis who wrote the original function, and who wrote it in such a way that it was fairly painless to extend it in the manner above.

Wednesday, 28 April 2010

Transitions in R redux

Previously, I shared with the world a function to create a pairwise matrix of the number of transitions and transversions between two DNA sequences. Klaus Schliep kindly pointed out the possibility of a bug in the function and offered a faster, more accurate version. Thanks Klaus!

titv<-function(dat){
mat<-as.matrix(dat)
res<-matrix(NA, ncol=dim(mat)[1], nrow=dim(mat)[1], dimnames=list(x=names(dat), y=names(dat)))
for(i in 1:(dim(mat)[1] - 1)){
for(j in (i+1):dim(mat)[1]){
vec<-as.numeric(mat[i,])+as.numeric(mat[j,])-8
res[j,i]<-sum(!is.na(match(vec,c(200,56))))#Transitions
res[i,j]<-sum(!is.na(match(vec,c(152,168,88,104))))#Transversions
}
}
res
}

The previous version of the function considered the difference between an unknown base (coded as N) and a T as a transition. The new version does not detect this difference.

Friday, 22 January 2010

Highly diverse weevils in northern New Guinea

New Guinea is an amazing place. It is one of the final frontiers of exploration, particularly in the biological realm with highly diverse rainforest that cover huge areas and a nearly unbelievable range of habitats from hot, humid mangrove swamp forests to 4,000 m high mountains and glaciers. The diversity of the island astounds everyone who works there and the amount remaining to be discovered absolutely boggles the mind.

A case in point was published late last year, when research on Trigonopterus weevils from the Cyclops Mountains was published. This research was headed up by Alexander Riedel and they looked at the congruence between clades revealed by cytochrome c oxidase 1 (COI) DNA sequences and morphological variation. They found 51 morphospecies which were all congruent with COI data. What is incredible though is the genetic distances within this group. Uncorrected distances between species were incredibly high, the lowest being 16.5% and a mean of 20.5%. Within species variation ranged from 0% (not too surprising), to a whopping 8.8%. To put this in context, a 2% genetic distance is usually bandied about as being the point at which you're thinking that you've got two different species.

This diversity is particuarly impressive when one considers that these results are derived from a single transect in a relatively low area in one mountain range. The authors justifiably expect that more extensive sampling will produce many more species.

Not only are they incredibly diverse, these weevils are also tough. Being cryptorhynchine weevils, their rostrum can fold up into a groove in their thorax when they're disturbed. Unlike most other cryptorhynchines though their elytra are fused together and to the thorax, making them able to withstand extremely high pressure and ensuring that they are very difficult to dissect. This is a problem when dissections are necessary to fully characterise and identify these beetles.

It's a very interesting paper on a really cool group of weevils. Check out the supporting information for habitus photos of the morphospecies and get an idea of the morphological variation in the group.


References:

Riedel A, Daawia D, Balke M. 2010. Deep cox1 divergence and hyperdiversity of Trigonopterus weevils in a New Guinea mountain range (Coleoptera, Curculionidae). Zoologica Scripta 39(1): 63--74.

Wednesday, 6 January 2010

Transitions and transversions in R

A couple of months ago I wrote the following R function to calculate the number of transitions and transversions between DNA sequences in an alignment. The function is fairly slow (an alignment of ~100 sequences, 800 bp in length takes around 30 seconds to run) thanks to the double for loop, however in this case I shall plead Uwe's Maxim: "Computers are cheap and thinking hurts".

In other R news, there's a cool site, R-bloggers, that is a portal to a number of other blogs that deal with R. It's great to see what other people manage to do in R and a good way to learn about its capabilities.

Happy New Year!

library(ape)

#Input: dat---an object of class 'DNAbin'

titv<-function(dat){
mat<-as.matrix(dat)
res<-matrix(NA, ncol=dim(mat)[1], nrow=dim(mat)[1], dimnames=list(x=names(dat), y=names(dat)))
for(i in 1:dim(mat)[1]){
for(j in 1:dim(mat)[1]){
vec<-as.numeric(mat[i,])+as.numeric(mat[j,])-8
res[i,j]<-length(grep("200|56",vec)) #Transitions
res[j,i]<-length(grep("152|168|88|104",vec)) #Transversions
}
}
res
}

#Example

data(woodmouse)

ti<-titv(woodmouse)
tv<-t(ti)

tv[lower.tri(tv)] #Number of transversions
ti[lower.tri(ti)] #Number of transitions

#Saturation plot
dist<-dist.dna(woodmouse)

plot(ti[lower.tri(ti)]~dist)
points(tv[lower.tri(tv)]~dist, pch=20, col="red")


Friday, 9 October 2009

Carpophilus publications


Searching for things on Carpophilus species, I came across a couple of papers by Alexander Kirejtshuk on the things. There's one on the nitidulids of India and one on the African fauna.

I was also very interested to find a short report on nitidulid molecular systematics, as I have been completely unaware of any work that's being going on in that vein other than my own.

On the left is a beautiful picture of Carpophilus oculatus. Isn't it nice!

Monday, 5 October 2009

Melicope: Hawaii's export to the Pacific

Over the years, the general consensus has been that islands of the Pacific, and particularly the incredibly isolated archipelago we fondly know as Hawaii have been the passive collectors of fauna and flora that have just happened to have swum, flown, drifted, or been blown onto their fair shores. It's generally been thought to have been a one-way process, that once something has arrived there, it settles down and makes the most of their tropical paradise. Something that those of us stuck in cold climates can relate to very well -- why would you want to leave a place that is extremely amiable and is yours for the taking?

However, recent systematic research on a number of organisms is starting to shake up this tidy story somewhat. It appears that we may have underestimated the ability of these islands to send their biota elsewhere.

The particular paper sparking this post, written by Danica Harbaugh and coauthors, features the shrub Melicope. It's widely distributed across Asia and the South Pacific, but has undergone an "explosive radiation" in Hawaii, with 47 species found in the group. As usual, the authors hypothesised that all Hawaiian species had originated from a single colonisation and formed a monophyletic group restricted to the islands. Data from a number of genes were analysed, and it was found that although it does seem to be the case that all Hawaiian Melicope were derived from a single colonisation, it hasn't remained stuck in the one place. Surprisingly, their data suggested that Hawaii has exported some of their plants to the Marquesas Islands, where they have subsequently speciated.

This data adds to the body of work that suggests that Pacific biogeography is a lot more dynamic and complicated than initially suspected. It is also another example of the very intriguing connection that exists between Hawaii and the Marquesas.

References:
Harbaugh DT, Wagner EI, Allan GJ, Zimmer EA. 2009. The Hawaiian Archipleago is a stepping stone for dispersal in the Pacific: an example from the plant genus Melicope (Rutaceae). Journal of Biogeography 36: 230-241.

Sunday, 27 September 2009

Fruit bats going the wrong way

A couple of years back, Jeremy Pulvers and Don Colgan published an interesting paper on the intriguing fruit bat genus Melonycteris, that is restricted to the Solomon Islands and the Bismarck Archipelago. The fascinating thing about this bat is that it is believed to be placed right at the base of the Megachiroptera (flying foxes and their ilk). Why it is that this supposedly old lineage is restricted to these isolated island groups is still unknown, but it is not alone in this pattern. In the birds, a number of the more ancient groups are found in and around New Guinea and the Australasian region.

This however, is not the thrust of the Pulvers and Colgan paper. What they did is look at the genetic systematics and variation within the genus, particularly the Solomon Island species. To summarise, they found that the Solomon species are a group separate from the single Bismarck species. What was more interesting was the pattern of relationships within the Solomon Islands population. They found that the species on Makira (San Cristobal) was sister to the rest, followed by the Malaitan species, then the species found in the New Georgia group. Choiseul, Isabel and Guadalcanal populations composed a single group and were the most derived.

What is interesting about this pattern is that it is the opposite of what would be expected from a simple dispersal model originating in the Bismarcks. If that was the case, you would expect the sequence to be essentially the opposite---New Georgia; Choiseul, Isabel and Guadalcanal; Malaita, then Makira.

There has been increasing evidence from birds that the "Dispersal from New Guinea" model of the makeup of the Solomon Island fauna is not the only story, but as far as I'm aware, this is the first publication of evidence in vertebrates other than birds.

Reference:
Pulvers JN, Colgan DJ. 2007. Molecular phylogeography of the fruit bat genus Melonycteris in northern Melanesia. Journal of Biogeography 34:713-723.

Saturday, 12 September 2009

The bizarre family of the Silktail

The silktail (Lamprolia victoriae) is a small bush bird, restricted to the Fijian islands of Vanua Levu and Taveuni. From its first description in 1874 its systematic position has been debated with suggested closest relatives ranging from the australian robins (Petroicidae), and the monarch flycatchers (Monarchidae), to the birds of paradise (Paradisaeidae). The late, great Ernst Mayr famously called the silktail "One of the most puzzling birds of the world". Last year, a group of european, american and south african scientists headed up by Martin Irestedt brought DNA evidence to the party to shed further light on the subject. Their results were published here.

What they discovered was totally unexpected. Their data suggests that the closest living relative to the silktail is the Papuan mountain drongo (PMD, Chaetorhynchus papuensis), a little-known bird of the New Guinea highlands. The PMD has traditionally been grouped with the drongos (Dicruridae), but in the Irestedt study, both the silktail and PMD are sister to the fantail family (Rhipiduridae).

The authors discuss at length the biogeographic implications of their finding, suggesting either long distance dispersal or a vicariant metapopulation origin, but are unable to come to a conclusion either way. Unfortunately, they don't suggest ways of testing these hypotheses. I suggets it may be a little premature to speculate too seriously about this single result, interesting though it is. Future work on the geology of the region and further systematic research on the silktail and the remainder of the avifauna of Melanesia may reveal other potential explanations.

References:
Irestedt M., Ruchs J., Jonsson K., Ohlson J. I., Pasquet E., Ericson P. G. P. (2008) The systematic affinity of the enigmatic Lamprolia victoriae (Aves: Passeriformes) - An example of avian dispersal between New Guinea and Fiji over Miocene intermittent land bridges? Molecular Phylogenetics and Evolution 48: 1218-1222

Picture courtesy of Birdlife International