Category: Bioinformatics

Sanger Sequel

   Sam Nicholls    No Comments yet    Sanger-QC

In a change to scheduled programming, days after touching down from my holiday (which needs a post of its own) I moved1 to spend the next few weeks back at the Wellcome Trust Sanger Institute in Cambridgeshire. I interned here previously in 2012 and it’s still like working at a science-orientated Google thanks to the […]

`rapsearch` Returns

   Sam Nicholls    No Comments yet    AU-PhD

Following completion of my most recent side-quest to find a little more about who the protozoa actually are and where they live in the context of UniProt, I now had a starting point to append to my archive of hydrolase records. I had already shown that around 1,500 Ciliophora-associated hydrolases could be extracted from UniProt, […]

Playing Phylogenetic Hide and Seek with Protozoa

   Sam Nicholls    No Comments yet    Bioinformatics, Mysteries

Amanda suggested that alongside archaeal, bacterial and fungal associated hydrolases, we should also look at protozoans. No problem, I’ll just get the taxonomy ID for protozoa and extract another database from UniProtKB as before. Simple! Or so I thought… The rabbit hole is pretty deep on this one. Feel free to skip my multi-day exploration […]

Raiding `rapsearch` Results

   Sam Nicholls    No Comments yet    AU-PhD

Finally. After all the trouble I’ve had trying to scale BLAST, running out of disk space, database accounting irregularities and investigating an archive_exception, we have data. Thanks to the incredible speed of rapsearch, what I’ve been trying to accomplish over the past few months with BLAST has been done in mere hours without the hassle […]

What am I doing?

   Sam Nicholls    No Comments yet    AU-PhD

A week ago I had a progress meeting with Amanda and Wayne, who make up the supervisory team for the computational face of my project. I talked about how computers are terrible and where the project is heading. As Wayne had been away from meetings for a few weeks, I began with a roundup of […]


   Sam Nicholls    No Comments yet    System Administration, Tools

As a curious and nosy individual who likes to know everything, I wrote a script dubbed memblame which is responsible for naming and shaming authors of “inefficient”1 jobs at our cluster here in IBERS. It takes time, often days, sometimes longer, of patience to see large-input jobs executed on a node on the compute cluster […]


   Sam Nicholls    No Comments yet    Bioinformatics, Mysteries

Something appears amiss with TrEMBL, millions of sequences are “missing”. Where did they go? At the end of last month, to build a database of bacterial sequences with known hydrolase activity1, I extracted around 2.9 million sequences from UniProtKB/TrEMBL; a popular database which contains sequences that have been automatically annotated and are awaiting manual curation […]

The Story so Far: Part I, A Toy Dataset

   Sam Nicholls    No Comments yet    AU-PhD

In this somewhat long and long overdue post; I’ll attempt to explain the work done so far and an overview of the many issues encountered along the way and an insight in to why doing science is much harder than it ought to be. This post got a little longer than anticipated, so I’ve sharded […]