The Praise of Insects: Two R functions for working with DNA alignments

Tuesday, 29 March 2011

Two R functions for working with DNA alignments

Recently I wrote a couple of small functions as a result of work done by myself and others in my lab group. The first is a function that determines what sites in a sequence alignment are ambiguous (i.e. not A, G, C or T).

require(ape)
data(woodmouse)

is.ambig <- function(x){
   x <- as.matrix(x)
   bases <- c(136, 72, 40, 24)
   ambig <- apply(x, 2, FUN=function(x) sum(as.numeric(!as.numeric(x) %in% bases)))
   ambig > 0
}

is.ambig(woodmouse)

This function utilises the bit-level coding scheme that Emmanuel Paradis developed for encoding sequences in R. The unambiguous bases A, G, C and T have the numeric values 136, 72, 40 and 24 respectively. This function figures out which sites don't have these values and returns a vector of TRUEs and FALSEs, TRUEs being ambiguous bases.

The second function is an implementation of Tajima's K, published as equation A3 in Tajima 1983

tajima.K <- function(x, prop = TRUE){
   res <- mean(dist.dna(x, model="N"))
   if(prop) res <- res/dim(x)[2]
   res
}

tajima.K(woodmouse)

This function calculates the mean number of sites that are different between any two sequences. As a default, it returns the result as a proportion of the length of the alignment. Setting prop = FALSE will return the result as the actual number of sites.

References:
Tajmia F. 1983. Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437-460.

No comments:

Post a Comment

Statio

He is the image of the invisible God, the firstborn over all creation. For by Him all things were created that are in heaven and that are on earth, visible and invisible, whether thrones or dominions or principalities or powers. All things were created through Him and for Him. And He is before all things, and in Him all things consist. And He is the head of the body, the church, who is the beginning, the firstborn from the dead, that in all things He may have the preeminence. Colossians 1:15–18

The Praise of Insects

Tuesday, 29 March 2011

Two R functions for working with DNA alignments

No comments:

Blog Archive

Who's responsible....

Statio

Very good places to visit

Labels