Bioinformatics – Page 3

Sanger Sequel

Sam Nicholls 24th June 201518th January 2016 No Comments yet Sanger-QC

In a change to scheduled programming, days after touching down from my holiday (which needs a post of its own) I moved1 to spend the next few weeks back at the Wellcome Trust Sanger Institute in Cambridgeshire. I interned here previously in 2012 and it’s still like working at a science-orientated Google thanks to the […]

quality control, samtools, sanger

`rapsearch` Returns

Sam Nicholls 1st June 201518th January 2016 No Comments yet AU-PhD

Following completion of my most recent side-quest to find a little more about who the protozoa actually are and where they live in the context of UniProt, I now had a starting point to append to my archive of hydrolase records. I had already shown that around 1,500 Ciliophora-associated hydrolases could be extracted from UniProt, […]

blast, performance, rapsearch, uniprot

Playing Phylogenetic Hide and Seek with Protozoa

Sam Nicholls 18th May 201518th January 2016 No Comments yet Bioinformatics, Mysteries

Amanda suggested that alongside archaeal, bacterial and fungal associated hydrolases, we should also look at protozoans. No problem, I’ll just get the taxonomy ID for protozoa and extract another database from UniProtKB as before. Simple! Or so I thought… The rabbit hole is pretty deep on this one. Feel free to skip my multi-day exploration […]

protozoa, taxonomy

Raiding `rapsearch` Results

Sam Nicholls 9th May 201518th January 2016 No Comments yet AU-PhD

Finally. After all the trouble I’ve had trying to scale BLAST, running out of disk space, database accounting irregularities and investigating an archive_exception, we have data. Thanks to the incredible speed of rapsearch, what I’ve been trying to accomplish over the past few months with BLAST has been done in mere hours without the hassle […]

blast, rapsearch, uniprot

Aligned Annihilation II: Dumpster Diving

Sam Nicholls 4th May 201526th October 2015 No Comments yet AU-PhD

I tried to extract a single integer from a core dump and instead fell in to an abyss and learned how to be a computer.

Aligned Annihilation

Sam Nicholls 1st May 201526th October 2015 No Comments yet AU-PhD

This afternoon in a coffee fueled fugue, I nuked every directory containing output for any attempt to align the limpet contigs to any form of database so far. Here’s why, and what I did next.

What am I doing?

Sam Nicholls 27th April 201513th January 2016 No Comments yet AU-PhD

A week ago I had a progress meeting with Amanda and Wayne, who make up the supervisory team for the computational face of my project. I talked about how computers are terrible and where the project is heading. As Wayne had been away from meetings for a few weeks, I began with a roundup of […]

introduction, phd, project

`memblame`

Sam Nicholls 26th April 20151st November 2015 No Comments yet System Administration, Tools

As a curious and nosy individual who likes to know everything, I wrote a script dubbed memblame which is responsible for naming and shaming authors of “inefficient”1 jobs at our cluster here in IBERS. It takes time, often days, sometimes longer, of patience to see large-input jobs executed on a node on the compute cluster […]

TrEMBLing

Sam Nicholls 24th April 20151st November 2015 No Comments yet Bioinformatics, Mysteries

Something appears amiss with TrEMBL, millions of sequences are “missing”. Where did they go? At the end of last month, to build a database of bacterial sequences with known hydrolase activity1, I extracted around 2.9 million sequences from UniProtKB/TrEMBL; a popular database which contains sequences that have been automatically annotated and are awaiting manual curation […]

The Story so Far: Part I, A Toy Dataset

Sam Nicholls 21st April 201526th October 2015 No Comments yet AU-PhD

In this somewhat long and long overdue post; I’ll attempt to explain the work done so far and an overview of the many issues encountered along the way and an insight in to why doing science is much harder than it ought to be. This post got a little longer than anticipated, so I’ve sharded […]

fastq, fastqc, introduction, limpet, quality control

Category: Bioinformatics