Wednesday, 6 January 2010

Transitions and transversions in R

A couple of months ago I wrote the following R function to calculate the number of transitions and transversions between DNA sequences in an alignment. The function is fairly slow (an alignment of ~100 sequences, 800 bp in length takes around 30 seconds to run) thanks to the double for loop, however in this case I shall plead Uwe's Maxim: "Computers are cheap and thinking hurts".

In other R news, there's a cool site, R-bloggers, that is a portal to a number of other blogs that deal with R. It's great to see what other people manage to do in R and a good way to learn about its capabilities.

Happy New Year!


#Input: dat---an object of class 'DNAbin'

res<-matrix(NA, ncol=dim(mat)[1], nrow=dim(mat)[1], dimnames=list(x=names(dat), y=names(dat)))
for(i in 1:dim(mat)[1]){
for(j in 1:dim(mat)[1]){
res[i,j]<-length(grep("200|56",vec)) #Transitions
res[j,i]<-length(grep("152|168|88|104",vec)) #Transversions




tv[lower.tri(tv)] #Number of transversions
ti[lower.tri(ti)] #Number of transitions

#Saturation plot

points(tv[lower.tri(tv)]~dist, pch=20, col="red")


Tal Galili said...

Hi there,
Thank you for the post, I would like to suggest two things:

1) Please consider adding the link to R Bloggers:

2) Would you like to add your blog (the posts tagged with R) to R-bloggers?
If so, you could e-mail me about it or filling the form on:

Best :)

Samuel Brown said...

This version of titv() was found to have a minor bug in it that considered the difference between an N and a T to be a transition. A faster version of the functions that doesn't give this result has been generously given to me by Klaus Schliep and can be found here.