Sanger Sequel
In a change to scheduled programming, days after touching down from my holiday (which needs a post of its own) I moved1 to spend the next few weeks back at the Wellcome Trust Sanger Institute in Cambridgeshire. I interned here previously in 2012 and it’s still like working at a science-orientated Google thanks to the […]
`rapsearch` Returns
Following completion of my most recent side-quest to find a little more about who the protozoa actually are and where they live in the context of UniProt, I now had a starting point to append to my archive of hydrolase records. I had already shown that around 1,500 Ciliophora-associated hydrolases could be extracted from UniProt, […]
Playing Phylogenetic Hide and Seek with Protozoa
Amanda suggested that alongside archaeal, bacterial and fungal associated hydrolases, we should also look at protozoans. No problem, I’ll just get the taxonomy ID for protozoa and extract another database from UniProtKB as before. Simple! Or so I thought… The rabbit hole is pretty deep on this one. Feel free to skip my multi-day exploration […]
Raiding `rapsearch` Results
Finally. After all the trouble I’ve had trying to scale BLAST, running out of disk space, database accounting irregularities and investigating an archive_exception, we have data. Thanks to the incredible speed of rapsearch, what I’ve been trying to accomplish over the past few months with BLAST has been done in mere hours without the hassle […]
Aligned Annihilation II: Dumpster Diving
I tried to extract a single integer from a core dump and instead fell in to an abyss and learned how to be a computer.
Aligned Annihilation
This afternoon in a coffee fueled fugue, I nuked every directory containing output for any attempt to align the limpet contigs to any form of database so far. Here’s why, and what I did next.
What am I doing?
A week ago I had a progress meeting with Amanda and Wayne, who make up the supervisory team for the computational face of my project. I talked about how computers are terrible and where the project is heading. As Wayne had been away from meetings for a few weeks, I began with a roundup of […]
`memblame`
As a curious and nosy individual who likes to know everything, I wrote a script dubbed memblame which is responsible for naming and shaming authors of “inefficient”1 jobs at our cluster here in IBERS. It takes time, often days, sometimes longer, of patience to see large-input jobs executed on a node on the compute cluster […]
TrEMBLing
Something appears amiss with TrEMBL, millions of sequences are “missing”. Where did they go? At the end of last month, to build a database of bacterial sequences with known hydrolase activity1, I extracted around 2.9 million sequences from UniProtKB/TrEMBL; a popular database which contains sequences that have been automatically annotated and are awaiting manual curation […]
The Story so Far: Part I, A Toy Dataset
In this somewhat long and long overdue post; I’ll attempt to explain the work done so far and an overview of the many issues encountered along the way and an insight in to why doing science is much harder than it ought to be. This post got a little longer than anticipated, so I’ve sharded […]