printreads – Samposium https://samnicholls.net The Exciting Adventures of Sam Mon, 11 Jan 2016 22:46:39 +0000 en-GB hourly 1 https://wordpress.org/?v=5.7.5 101350222 Duplicate definition error with GATK PrintReads and MalformedReadFilter https://samnicholls.net/2016/01/07/gatk-printreads-malformedreadfilter/ https://samnicholls.net/2016/01/07/gatk-printreads-malformedreadfilter/#comments Thu, 07 Jan 2016 19:27:17 +0000 https://samnicholls.net/?p=468 This afternoon I wanted to quickly check1 whether some reads in a BAM would be filtered out by the GATK MalformedReadFilter. As you can’t invoke the filter alone, I figured one of the quickest ways to do this would be to utilise GATK PrintReads, which pretty much parses and spits out input BAMs, while also allowing one to specify filters and the like to be applied to the parser as it dutifully goes by its job of taking up all your cluster’s memory. I entered the command, taking care to specify MalformedRead for the -rf read filter option, feeling particularly pleased with myself for finally being capable of using a GATK command from memory:

java -jar GenomeAnalysisTK.jar -T PrintReads -rf MalformedRead -I <INPUT> -R <REFERENCE>

GATK, wanting to teach me a lesson for not consulting documentation, quickly dumped a stack trace to my terminal and wiped the smile off my face.

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: Duplicate definition of argument with full name: filter_reads_with_N_cigar
        at org.broadinstitute.gatk.utils.commandline.ArgumentDefinitions.add(ArgumentDefinitions.java:59)
        at org.broadinstitute.gatk.utils.commandline.ParsingEngine.addArgumentSource(ParsingEngine.java:150)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:207)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.5-0-g36282e4):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Duplicate definition of argument with full name: filter_reads_with_N_cigar
##### ERROR ------------------------------------------------------------------------------------------

At this point I felt somewhat hopeless, I was actually trying to use the MalformedReadFilter to debug something else, now I was stuck two errors deep surrounded by more Java than I could stomach. Before having a full breakdown about whether bioinformatics really is broken, I remembered I am a little familiar with the filter in question. Indeed, I recognised the filter_reads_with_N_cigar argument from the error as one that can be supplied to the MalformedReadFilter itself. This seems a little odd, where could it be getting a duplicate definition from?

Of course, from my own blog post and the PrintReads manual page, I should have recalled that the MalformedReadFilter is automatically applied by PrintReads. Specifying the same filter on top with -rf apparently causes somewhat of a parsing upset. So there you have it, if you want to check whether your reads will be discarded by the MalformedReadFilter, you can just use PrintReads:

java -jar GenomeAnalysisTK.jar -T PrintReads I <INPUT> -R <REFERENCE>

tl;dr

  • GATK PrintReads applies the MalformedReadFilter automatically
  • Specifying -rf MalformedRead to PrintReads is not only redundant but problematic
  • Always read the fucking manual
  • Read your own damn blog
  • GATK is unforgiving

  1. It’s about time I realised that in bioinformatics, nobody has ever successfully “quickly checked” anything. 
]]>
https://samnicholls.net/2016/01/07/gatk-printreads-malformedreadfilter/feed/ 1 468