Tuesday, 28 August 2012

PhD week 25: Databasing

Gratuitous image: Bizarre oriental brentid weevils. From left to right:
Arrenodes xiphias, Calodromus mellyi female, C. mellyi male, Ceocephalus forcipatus.
Modified from BioDivLibrary's Flickr Photostream.

The raw data for taxonomic research comes from specimens—parts of individual organisms preserved and held in collections as a perpetual record. In the case of entomological taxonomy, we tend to deal with whole organisms. This is not the case for everyone. It is not particularly convenient to preserve whole whales or trees, for example. Also, because insects are small, common and don't have vertebrae, large numbers can be collected and stored. This means that I am in the fortunate position of having many hundreds of specimens to look at, which will give me an appreciation of the variation that exists within and between species. The downside is that I have many hundreds of specimens to look at and manage.

A range of data can be obtained from these specimens, including geographic coordinates, details of morphological features and DNA sequences. To manage everything, I've given each specimen a unique number which serves as a data identifier. I have a spreadsheet into which I enter the geographic and morphological data for each specimen; and the DNA sequences are stored as a FASTA formatted file.

While some might argue that a relational database may be more suited for this sort of thing, I am content with the system at present. Because the focus is on specimens, as opposed to collecting events or other aspects involving multiple specimens, the spreadsheet is suitable. Having the unique specimen number also means that it should be fairly straightforward to migrate the data into a relational database if necessary.


Read:
Psalms 102–104

Websites:
Public Domain Review
Inkscape books
A guide to Inkscape
Geometry and Postscript

Listened:
Leo Tolstoy—War and Peace Book 2 LibriVox audiobook

Watched:
Star Trek: Deep Space Nine Season 5

1 comment:

Rupert said...

Hi Sam. The other day I installed an Apache Web server, wrote a script to convert fasta files into csv (nice to do this in R, but looked a bit complicated), imported those into MySQL, and then wrote some PHP to interactively search for taxa and output the data back into fasta text via the Web browser. Didn't actually take that long. If you subversion controlled it too, that would be nice eh ...

I can send you some code if you fancy. Rupert.