The Praise of Insects: computer

Showing posts with label computer. Show all posts

Tuesday, 28 August 2012

PhD week 25: Databasing

Gratuitous image: Bizarre oriental brentid weevils. From left to right:
Arrenodes xiphias, Calodromus mellyi female, C. mellyi male, Ceocephalus forcipatus.
Modified from BioDivLibrary's Flickr Photostream.

The raw data for taxonomic research comes from specimens—parts of individual organisms preserved and held in collections as a perpetual record. In the case of entomological taxonomy, we tend to deal with whole organisms. This is not the case for everyone. It is not particularly convenient to preserve whole whales or trees, for example. Also, because insects are small, common and don't have vertebrae, large numbers can be collected and stored. This means that I am in the fortunate position of having many hundreds of specimens to look at, which will give me an appreciation of the variation that exists within and between species. The downside is that I have many hundreds of specimens to look at and manage.

A range of data can be obtained from these specimens, including geographic coordinates, details of morphological features and DNA sequences. To manage everything, I've given each specimen a unique number which serves as a data identifier. I have a spreadsheet into which I enter the geographic and morphological data for each specimen; and the DNA sequences are stored as a FASTA formatted file.

While some might argue that a relational database may be more suited for this sort of thing, I am content with the system at present. Because the focus is on specimens, as opposed to collecting events or other aspects involving multiple specimens, the spreadsheet is suitable. Having the unique specimen number also means that it should be fairly straightforward to migrate the data into a relational database if necessary.

Read:
Psalms 102–104

Websites:
Public Domain Review
Inkscape books
A guide to Inkscape
Geometry and Postscript

Listened:
Leo Tolstoy—War and Peace Book 2 LibriVox audiobook

Watched:
Star Trek: Deep Space Nine Season 5

Wednesday, 15 August 2012

A method for subsetting FASTA files

I got back my first sequences for various Irenimus specimens this past week, and have created nice, clean contigs from the forward and reverse sequences. I've done this using FinchTV and Seaview, saving the results as a FASTA file with all of forward, reverse and consensus sequences for each specimen. Saving the data in this format has the benefit of being suitable for tracking through version control software, which means that every change I make to the file can be recalled. I'm only using one file for creating the contigs, but I'm using three gene regions, which will then need to be aligned with each other in the future. Thus, I need to have a method for subsetting my master document into smaller files with only those sequences from the same gene regions.

To do this, I have come up with a convention for naming the sequences I wish to use down the line:

>geneRegion|specimenNumber|speciesCode|otherInformation

From here, all sequences from a certain gene region can be retrieved using a little piece of awk magic. For example, all sequences from the 28S ribosomal RNA region (i.e. those starting with the line >28S|....) can be obtained by running the following code in the terminal:

awk '/>/{p=0};/>28S/{p=1} p' raw_sequences > 28S.fasta

A big thanks to backreference.org for pointing out how this might be achieved.

Friday, 4 May 2012

PhD week 9: I have become precise

This week, I gave my laptop a complete overhaul by deleting the Windows Vista partition I still had from when I purchased the thing, and upgrading my operating system to Ubuntu 12.04 Precise Pangolin. It was a straightforward, painless process that went very smoothly. The only niggles are those things that you don't do very often, but would've been easier had you remembered backing them up. Backing up the .bash-history file is one such example.

The big change that I have encountered, coming from Ubuntu 10.04 Lucid Lynx, is the Unity desktop environment. I'm actually rather enjoying it, actually. I find the behaviour of Alt + Tab takes a bit of getting used to (and the icons could be smaller), but generally I am happy with the changeover.

One of the things that took me the most time though, was setting up a keyboard shortcut to my text editor. I was easily able to make my way to System settings > Keyboard > Shortcuts and create a new shortcut, but to my dismay it told me it was disabled. After far too long searching on the internet and getting confused about the gconf editor, I finally find a bug that seems to deal with the issue. I prepare to comment on it, and then read the workaround proposed.

Click the + button and add a name to remember the shortcut and the specific command you want to execute. Click Apply. Find the name in the list and click the RIGHT HALF of the row. "Disabled" should change to "New accelerator". (My emphasis)

Man, I felt stupid.

All in all, it's been a great week. As well as the successful computer cleanup, I learned this week that two papers on which I am an author have been accepted! Always good news to receive, and it's handy heading into a PhD with a few publications up one's sleeve. I also had a pleasant afternoon on Tuesday looking at a number of Irenimus specimens, and starting to think carefully about useful anatomical features for their identification, and how might be best to record them.

Read:
   Fleming CA. 1962. New Zealand biogeography—A paleontologist's approach. Tuatara 10(2): 53–108
   Ray ET. 2003. Learning XML. Cambridge, Mass.: O'Reilly
   McCulloch D. 2010. A History of Christianity: The First Three Thousand Years London: Penguin
   Psalms 48–51

Websites:
Marcus Brown's blog
Entrez Programming utilities help
Ubuntu's #1 priority for bug fixing
APE website

Listened:
21 Guns—Friends and Family
The Devil Wears Prada—With Roots Above and Branches Below

Watched:
Brick vs Face—Warriors Live at Hamtown Smakdown 2007
Stavesacre—It's Beautiful (Once You're Out Here) Music Video

Sunday, 26 February 2012

Python programming for dividing PDF files

Recently, a project I was involved with required me to split a PDF into multiple files. While the pdfpages package of LaTeX could be used to do this, it would be fairly unwieldly. What I wanted was a program that could be used from the command line. A scout around revealed a StackOverflow question that resolved a similar question using python. It was fairly straightforward to modify this code to create a program that did exactly what I needed. The result is hosted on GitHub.

This was my first practical experience with python, and I guess it's testament to the quality of the language's documentation that it didn't take long before I got something useful.

Tuesday, 13 December 2011

spider: an R package for species identity and evolution

spider: Species identity and evolution is an R package developed by the Lincoln University molecular ecology lab group to do a range of analyses that various lab members wanted to run that were not yet implemented in R. In particular, the package provides functions for conducting sliding window analyses on DNA sequences, the calculation of identification efficacy of a library of reference DNA sequences, and the segregation of distance matrices into their inter- and intra-specific components.

The above are the main attractions, and the ones that we tend to write about when promoting it in places like the 4th International Barcode of Life Conference. There's a bunch of other neat utilities in there also though. A couple of the ones that I particularly enjoy are tiporder(), which returns the tip labels in the order in which they appear on the tree; paa() which conducts population aggregate analysis on a dataset; and rosenberg() which calculates Rosenberg's probability of monophyly for the nodes on a tree.

Spider is available on CRAN, and R-Forge, the latter providing opportunities to report bugs and to collaborate in the future development of the package should you desire to do so.

Tuesday, 20 September 2011

Beamer themes

I enjoy using Beamer for my presentations. Initially, the limitations it imposes (particularly on graphics placement) are irksome, but after a while you realise that they actually help you create better presentations faster. I've tried 'quickly' putting together something in PowerPoint since starting to use Beamer and found it so fiddly that I went straight back and did it all in Beamer. However: I still find the themes and colour schemes provided in the base distribution of beamer to be less than ideal, and disobey the university's guidelines for presentations. As I garner the skills to start preparing my own themes, a list of customised Beamer themes will provide inspiration and guidance. In particular, I like the look of the Torino theme.

Thursday, 15 September 2011

Extracting comments from PDFs

I received a reviewer's response from one of my submitted papers a while ago, and have delayed working on it because they had written their comments in the PDF using "sticky notes". Unfortunately, these notes don't print very well. I like to be able to read things off the computer, so this presented a problem. Thankfully, PDFs encode their sticky note comments in ASCII-formatted text, which meant that I was able to extract the comments using the beautiful linux command line:

grep -o --text /Contents\([^/]* review.pdf | tee comments.txt

This single line resulted in a nice text file for me to print as I please.

Monday, 12 September 2011

post-installation script error solved

Had some problems today with aptitude getting its knickers in a knot, returning the error

dpkg: error processing install-info (--configure):
subprocess installed post-installation script returned error exit status 1

. This blog post helped to solve the problem simply and easily with the commands:

sudo rm /var/lib/dpkg/install-info.postinst
sudo aptitude reinstall install-info

Many thanks to azimout!

Monday, 8 August 2011

tlmgr not available for Ubuntu

One of the hardest things about LaTeX is the way it manages packages. Doing it manually is (in a word) annoying. When I was on windows I loved MikTeX for the ease by which it downloaded and installed extra packages, and I was disappointed when this functionality wasn't available on Ubuntu. Lately though I discovered that TeXlive had a similar package manager called tlmgr, and I started getting excited. When installing TeXlive though, I was dismayed to find that tlmgr did not work. A bit of a google search later I found that this was reason:

There is no way that a second package manager independent of the normal packaging infra structure (apt here, or rpm, or whatever) can work, because it will break the main system.

TeX Live Manager is currently only for system trees. THere is a patch in the dev repository for activating user mode, so that tlmgr can be used to manage TEXMFHOME, but it has not been worked on since quite some time (Norbert Preining on launchpad.net)

I take this to mean that there's only room in Ubuntu-town for one package manager, and synaptic is it. This is fair enough I guess, but it is still unfortunate.

Norbert goes on to say "get your hands dirty and help coding!" Unfortunately, my perl is non-existent, so I'll have to give it a miss until such a time as I actually have some idea what I'm doing.

Friday, 22 April 2011

Subversion

To start a new project on R-Forge, I've had to starting coming to grips with Subversion, the widely used program for source code versioning. To help me, I've found this tutorial very helpful for learning the "Subversion lifecycle" and the general use of the program. The Subversion book is a wonderful resource, but I'm currently not yet at the stage where my knowledge of the thing is at the level required to understand it properly.

Sunday, 27 February 2011

Freely available Digital Elevation Models for New Zealand

For people wanting to view aspects of New Zealand's amazing topography, a number of Digital Elevation Models are freely available from Geographx, a company specialising in producing New Zealand geographic information and atlases. Thanks Geographx!

Thursday, 13 January 2011

Ubuntu translations

A key part of the Ubuntu philosophy is their emphasis on making computers usable in one's language of choice:

We believe that every computer user should be able to use their software in the language of their choice.

The wiki page provides an entry point for those interested in contributing translations to the project. The process is managed through a project launchpad. Of most interest to me are the Samoan, Maori and Marshallese translations.

Thursday, 16 December 2010

Setting up a mail server on Ubuntu

Yesterday I bought a cheap computer as a phylogenetic workhorse. Today I installed PhyLiS, and have been working on getting a mail server configured so that I can get the thing to automatically send me results of analyses over the coming weeks. This blog post gave me the general gist of the commands required to send mail. After a few unsuccessful attempts at sending mail, I learned from this site that I needed to set myself up as a mail server. Thanks to the community documentation for Postfix, I was able to fairly painlessly it up and now have the ability to send messages from the linux command line.

Thursday, 18 November 2010

HTML helps

I am not a particularly on-to-it web designer (as you can probably guess by my having a blogger blog), having a rudimentary knowledge of HTML and not having either the time or the incentive to enter the brave new world of cascading style sheets and the like. While there are a good many websites that help with learning and remembering HTML tags, I've found the w3schools.com page to be particularly useful. The categories are not always aligned with my intuition, but it's good nonetheless.

Wednesday, 10 November 2010

Using pdfpages to rotate odd pages only

Today I scanned a document to PDF. It was a large document, and I scanned it in stages. No worries about putting the pieces together—that's what the LaTeX package pdfpages is for. What did cause a bit of a problem was that the odd and even pages were oppositely orientated. When I scanned the pages I had to turn the book around, meaning that all the odd pages were upside down.

To rectify this problem I had to delve into the dark world of LaTeX programming. It was an adventure, but thankfully it wasn't too difficult. What I came up with was the following tex file:

\documentclass[a4paper,twoside]{memoir}

\newcounter{number@}

\usepackage{pdfpages}

\begin{document}

\setcounter{number@}{0}

\loop\ifnum \value{number@} < 6 %CHANGE for each document
\stepcounter{number@}
   \ifodd \value{number@}
      \includepdf[pages=\arabic{number@}, angle=180]{document}
   \else
      \includepdf[pages=\arabic{number@}]{document}
\fi
\repeat

\end{document}

To illustrate, I've made an example PDF file to test it on. This test file is an open access paper from Zootaxa, the original of which is available here.

Do remember to change the number that \value{number@} is being compared to. This number is the total pages in "document.pdf", and I haven't yet figured out how to automatically retrieve it. Doing so would've consumed more time than I can afford just now.

Particularly helpful in this adventure was the Tralics site that contains documentation on all TeX commands, and this site on counters.

Tuesday, 26 October 2010

Google maps latitude/longitude bookmarklet

Bookmarklets are little strings of JavaScript that reside in your internet bookmark list and can do useful things. In particular, the one I find most useful is this one that retrieves the coordinates of the point at the centre of a Google Maps window:

javascript:alert(window.gApplication.getMap().getCenter())

Thanks to liquidx for writing it!

Thursday, 9 September 2010

Image-stacking software for Linux

Back in the day, when I was still Windows-based, I was able to get some pretty decent focus-stacked ("automontage") photos of insects using the brilliant freeware programs DeepFocus and PrepareStack written by Stuart Ball. Unfortunately, I can't find the download anywhere, though his detailed manual is still available. While commercial applications are available, I have not yet found an open-source version that will suffice. Internet searches indicate that ImageMagick's "combine" might be suitable, when given a suitable stack of photos. Preparing that stack is a little trickier. There are suggestions that GIMP might be suitable, however as far as I can see, there are no published scripts or tutorials that make it easier beyond tedious manual adjustments. I will continue to look around and see if I can work out some sorta fix.

Monday, 30 August 2010

Biomolecular Graphics

A recently published article in PLoS Computational Biology is one by Cameron Mura and colleagues that discusses the great potential held by biomolecular graphics. It discusses the terminology, tools and how to go about teaching yourself the basics. While it is very biochemistry-focussed, the highlight of the paper "Box 2: Nine Simple Rules for Biomolecular Graphics" present some very useful hints to guide any scientific illustrator.

Reference:

Mura C, McCrimmon CM, Vertrees J, Sawaya MR. (2010). An Introduction to Biomolecular Graphics. PLoS Computational Biology 6(8): e1000918. doi:10.1371/journal.pcbi.1000918

Friday, 27 August 2010

Speech recognition in Linux

Whenever I'm looking at specimens under the microscope and noticing differences, I find it very difficult to stop what I'm doing, look at a bit of paper, and write it down. I'd much prefer to talk about it while looking at the specimen.

The first solution is to record yourself while talking. Audacity is a free, open-source music editing program that is pretty decent. I don't know how useful hard-core sound engineers would find it, but it's not bad for the application that I'm wanting to use it for, namely, recording my voice while I waffle on about what a specimen looks like. I could then listen to the recording repeatedly and transcribe what I say. More efficiently though, I'd be keen for some sorta sound recognition software a la Dragon NaturallySpeaking. NaturallySpeaking is the biggest-selling voice recognition software, and by all accounts it's pretty good. It does require some coin though, so I'm looking for less expensive, preferably open-source programs.

Unfortunately, it doesn't seem like there's much out there. A wikipedia page is a good entry point to the problem. Apparently one of the biggest issues is the lack of a voice database to test algorithms on. To solve this issue, VoxForge has been set up to encourage people to upload recordings and to work on the problem. The Ubuntu Wiki also has a page giving a bit of a road map of what Ubuntu want to see done. It looks like it might be a good project to start working on.

Friday, 20 August 2010

Phylogenetic trees online

The other day, an article was published in PLoS One describing a newly developed JavaScript library to visualise phylogenetic trees online: jsPhyloSVG. It's pretty nifty, and there's some pretty cool functionality that you can build into the trees. It's all based on the PhyloXML standard for describing phylogenetic trees and networks, but can display trees stored as other formats, in particular the common NEWICK format. The resulting files are viewable in any web browser, though Internet Explorer is dragging the chain a bit and does not yet support the full interactivity that other browsers are capable of.

It would be real cool to be able to export trees made and manipulated in R into PhyloXML format, and subsequent into PhyloSVG. Might be a fun project to work on when I've scraped some other things off my plate...

Reference:

Smits SA, Ouverney CC. (2010). jsPhyloSVG: A Javascript library for visualizing interactive and vector-based phylogenetic trees on the web. PLoS ONE 5(8): e12267. doi:10.1371/journal.pone.0012267