The Praise of Insects: A method for subsetting FASTA files

Wednesday, 15 August 2012

A method for subsetting FASTA files

I got back my first sequences for various Irenimus specimens this past week, and have created nice, clean contigs from the forward and reverse sequences. I've done this using FinchTV and Seaview, saving the results as a FASTA file with all of forward, reverse and consensus sequences for each specimen. Saving the data in this format has the benefit of being suitable for tracking through version control software, which means that every change I make to the file can be recalled. I'm only using one file for creating the contigs, but I'm using three gene regions, which will then need to be aligned with each other in the future. Thus, I need to have a method for subsetting my master document into smaller files with only those sequences from the same gene regions.

To do this, I have come up with a convention for naming the sequences I wish to use down the line:

>geneRegion|specimenNumber|speciesCode|otherInformation

From here, all sequences from a certain gene region can be retrieved using a little piece of awk magic. For example, all sequences from the 28S ribosomal RNA region (i.e. those starting with the line >28S|....) can be obtained by running the following code in the terminal:

awk '/>/{p=0};/>28S/{p=1} p' raw_sequences > 28S.fasta

A big thanks to backreference.org for pointing out how this might be achieved.

No comments:

Post a Comment

Statio

He is the image of the invisible God, the firstborn over all creation. For by Him all things were created that are in heaven and that are on earth, visible and invisible, whether thrones or dominions or principalities or powers. All things were created through Him and for Him. And He is before all things, and in Him all things consist. And He is the head of the body, the church, who is the beginning, the firstborn from the dead, that in all things He may have the preeminence. Colossians 1:15–18

The Praise of Insects

Wednesday, 15 August 2012

A method for subsetting FASTA files

No comments:

Blog Archive

Who's responsible....

Statio

Very good places to visit

Labels