protocol – Samposium https://samnicholls.net The Exciting Adventures of Sam Wed, 16 Nov 2016 17:51:42 +0000 en-GB hourly 1 https://wordpress.org/?v=5.7.5 101350222 Bioinformatics is a disorganised disaster and I am too. So I made a shell. https://samnicholls.net/2016/11/16/disorganised-disaster/ https://samnicholls.net/2016/11/16/disorganised-disaster/#respond Wed, 16 Nov 2016 17:50:59 +0000 https://samnicholls.net/?p=1581 If you don’t want to hear me wax lyrical about how disorganised I am, you can skip ahead to where I tell you about how great the pseudo-shell that I made and named chitin is.

Back in 2014, about half way through my undergraduate dissertation (Application of Machine Learning Techniques to Next Generation Sequencing Quality Control), I made an unsettling discovery.

I am disorganised.

The discovery was made after my supervisor asked a few interesting questions regarding some of my earlier discarded analyses. When I returned to the data to try and answer those questions, I found I simply could not regenerate the results. Despite the fact that both the code and each “experiment” were tracked by a git repository and I’d written my programs to output (what I thought to be) reasonable logs, I still could not reproduce my science. It could have been anything: an ad-hoc, temporary tweak to a harness script, a bug fix in the code itself masking a result, or any number of other possible untracked changes to the inputs or program parameters. In general, it was clear that I had failed to collect all pertinent metadata for an experiment.

Whilst it perhaps sounds like I was guilty of negligent book-keeping, it really wasn’t for lack of trying. Yet when dealing with many interesting questions at once, it’s so easy to make ad-hoc changes, or perform undocumented command line based munging of input data, or accidentally run a new experiment that clobbers something. Occasionally, one just forgets to make a note of something, or assumes a change is temporary but for one reason or another, the change becomes permanent without explanation. These subtle pipeline alterations are easily made all the time, and can silently invalidate swathes of results generated before (and/or after) them.

Ultimately, for the purpose of reproducibility, almost everything (copies of inputs, outputs, logs, configurations) was dumped and tar‘d for each experiment. But this approach brought problems of its own: just tabulating results was difficult in its own right. In the end, I was pleased with that dissertation, but a small part of me still hurts when I think back to the problem of archiving and analysing those result sets.

It was a nightmare, and I promised it would never happen again.

Except it has.

A relapse of disorganisation

Two years later and I’ve continued to be capable of convincing a committee to allow me to progress towards adding the title of doctor to my bank account. As part of this quest, recently I was inspecting the results of a harness script responsible for generating trivial haplotypes, corresponding reads and attempting to recover them using Gretel. “Very interesting, but what will happen if I change the simulated read size”, I pondered; shortly before making an ad-hoc change to the harness script and inadvertently destroying the integrity of the results I had just finished inspecting by clobbering the input alignment file used as a parameter to Gretel.

Argh, not again.

Why is this hard?

Consider Gretel: she’s not just a simple standalone tool that one can execute to rescue haplotypes from the metagenome. One must go through the motions of pushing their raw reads through some form of pipeline (pictured below) to generate an alignment (to essentially give a co-ordinate system to those reads) and discover the variants (the positions in that co-ordinate system that relate to polymorphisms on reads) that form the required inputs for the recovery algorithm first.

This is problematic for one who wishes to be aware of the providence of all outputs of Gretel, as those outputs depend not only on the immediate inputs (the alignment and called variants), but the entirety of the pipeline that produced them. Thus we must capture as much information as possible regarding all of the steps that occur from the moment the raw reads hit the disk, up to Gretel finishing with extracted haplotypes.

But as I described in my last status report, these tools are themselves non-trivial. bowtie2 has more switches than an average spaceship, and its output depends on its complex set of parameters and inputs (that also have dependencies on previous commands), too.

img_20161110_103257

bash scripts are all well and good for keeping track of a series of commands that yield the result of an experiment, and one can create a nice new directory in which to place such a result at the end – along with any log files and a copy of the harness script itself for good measure. But what happens when future experiments use different pipeline components, with different parameters, or we alter the generation of log files to make way for other metadata? What’s a good directory naming strategy for archiving results anyway? What if parts (or even all of the) analysis are ad-hoc and we are left to reconstruct the history? How many times have you made a manual edit to a malformed file, or had to look up exactly what combination of sed, awk and grep munging you did that one time?

One would have expected me to have learned my lesson by now, but I think meticulous digital lab book-keeping is just not that easy.

What does organisation even mean anyway?

I think the problem is perhaps exacerbated by conflating the meaning of “organisation”. There are a few somewhat different, but ultimately overlapping problems here:

  • How to keep track of how files are created
    What command created file foo? What were the parameters? When was it executed, by whom?
  • Be aware of the role that each file plays in your pipeline
    What commands go on to use file foo? Is it still needed?
  • Assure the ongoing integrity of past and future results
    Does this alignment have reads? Is that FASTA index up to date?
    Are we about to clobber shared inputs (large BAMS, references) that results depend on?
  • Archiving results in a sensible fashion for future recall and comparison
    How can we make it easy to find and analyse results in future?

Indeed, my previous attempts at organisation address some but not all of these points, which is likely the source of my bad feeling. Keeping hold of bash scripts can help me determine how files are created, and the role those files go on to play in the pipeline; but results are merely dumped in a directory. Such directories are created with good intent, and named something that was likely useful and meaningful at the time. Unfortunately, I find that these directories become less and less useful as archive labels as time goes on… For example, what the fuck is ../5-virus-mix/2016-10-11__ref896__reg2084-5083__sd100/1?

This approach also had no way to assure the current and future integrity of my results. Last month I had an issue with Gretel outputting bizarrely formatted haplotype FASTAs. After chasing my tail trying to find a bug in my FASTA I/O handling, I discovered this was actually caused by an out of date FASTA index (.fai) on the master reference. At some point I’d exchanged one FASTA for another, assuming that the index would be regenerated automatically. It wasn’t. Thus the integrity of experiments using that combination of FASTA+index was damaged. Additionally, the integrity of the results generated using the old FASTA were now also damaged: I’d clobbered the old master input.

There is a clear need to keep better metadata for files, executed commands and results, beyond just tracking everything with git. We need a better way to document the changes a command makes in the file system, and a mechanism to better assure integrity. Finally we need a method to archive experimental results in a more friendly way than a time-sensitive graveyard of timestamps, acronyms and abbreviations.

So I’ve taken it upon myself to get distracted from my PhD to embark on a new adventure to save myself from ruining my PhD2, and fix bioinformatics for everyone.

Approaches for automated command collection

Taking the number of post-its attached to my computer and my sporadically used notebooks as evidence enough to outright skip over the suggestion of a paper based solution to these problems, I see two schools of thought for capturing commands and metadata computationally:

  • Intrusive, but data is structured with perfect recall
    A method whereby users must execute commands via some sort of wrapper. All commands must have some form of template that describes inputs, parameters and outputs. The wrapper then “fills in” the options and dispatches the command on the user’s behalf. All captured metadata has uniform structure and nicely avoids the need to attempt to parse user input. Command reconstruction is perfect but usage is arguably clunky.
  • Unobtrusive, best-effort data collection
    A daemon-like tool that attempts to collect executed commands from the user’s shell and monitor directories for file activity. Parsing command parameters and inputs is done in a naive best-effort scenario. The context of parsed commands and parameters is unknown; we don’t know what a particular command does, and cannot immediately discern between inputs, outputs, flags and arguments. But, despite the lack of structured data, the user does not notice our presence.

There is a trade-off between usability and data quality here. If we sit between a user and all of their commands, offering a uniform interface to execute any piece of software, we can obtain perfectly structured information and are explicitly aware of parameter selections and the paths of all inputs and desired outputs. We know exactly where to monitor for file system changes, and can offer user interfaces that not only merely enumerate command executions, but offer searching and filtering capabilities based on captured parameters: “Show me assemblies that used a k-mer size of 31”.

But we must ask ourselves, how much is that fine-grained data worth to us? Is exchanging our ability to execute commands ourselves worth the perfectly structured data we can get via the wrapper? How much of those parameters are actually useful? Will I ever need to find all my bowtie2 alignments that used 16 threads? There are other concerns here too: templates that define a job specification must be maintained. Someone must be responsible for adding new (or removing old) parameters to these templates when tools are updated. What if somebody happens to misconfigure such a template? More advanced users may be frustrated at being unable to merely execute their job on the command line. Less advanced users could be upset that they can’t just copy and paste commands from the manual or biostars. What about smaller jobs? Must one really define a command template to run trivial tools like awk, sed, tail, or samtools sort through the wrapper?

It turns out I know the answer to this already: the trade-off is not worth it.

Intrusive wrappers don’t work: a sidenote on sunblock

Without wanting to bloat this post unnecessarily, I want to briefly discuss a tool I’ve written previously, but first I must set the scene3.

Within weeks of starting my PhD, I made a computational enemy in the form of Sun Grid Engine: the scheduler software responsible for queuing, dispatching, executing and reporting on jobs submitted to the institute’s cluster. I rapidly became frustrated with having an unorganised collection of job scripts, with ad-hoc edits that meant I could no longer re-run a job previously executed with the same submission script (does this problem sound familiar?). In particular, I was upset with the state of the tools provided by SGE for reporting on the status of jobs.

To cheer myself up, I authored a tool called sunblock, with the goal of never having to look at any component of Sun Grid Engine directly ever again. I was successful in my endeavour and to this day continue to use the tool on the occasion where I need to use the cluster.

screenshot-from-2016-11-16-16-11-11

However, as hypothesised above, sunblock does indeed require an explicit description of an interface for any job that one would wish to submit to the cluster, and it does prevent users from just pasting commands into their terminal. This all-encompassing wrapping feature; that allows us to capture the best, structured information on every job, is also the tool’s complete downfall. Despite the useful information that could be extracted using sunblock (there is even a shiny sunblock web interface), its ability to automatically re-run jobs and the superior reporting on job progress compared to SGE alone, was still not enough to get user traction in our institute.

For the same reason that I think more in-the-know bioinformaticians don’t want to use Galaxy, sunblock failed: because it gets in the way.

Introducing chitin: an awful shell for awful bioinformaticians

Taking what I learned from my experimentation with sunblock on-board, I elected to take the less intrusive, best-effort route to collecting user commands and file system changes. Thus I introduce chitin: a Python based tool that (somewhat)-unobtrusively wraps your system shell, to keep track of commands and file manipulations to address the problem of not knowing how any of the files in your ridiculously complicated bioinformatics pipeline came to be.

I initially began the project with a view to create a digital lab book manager. I envisaged offering a command line tool with several subcommands, one of which could take a command for execution. However as soon as I tried out my prototype and found myself prepending the majority of my commands with lab execute, I wondered whether I could do better. What if I just wrapped the system shell and captured all entered commands? This might seem a rather dumb and long-about way of getting one’s command history, but consider this: if we wrap the system shell as a means to capture all the input, we are also in a position to capture the output for clever things, too. Imagine a shell that could parse the stdout for useful metadata to tag files with…

I liked what I was imagining, and so despite my best efforts to get even just one person to convince me otherwise; I wrote my own pseudo-shell.

chitin is already able to track executed commands that yield changes to the file system. For each file in the chitin tree, there is a full modification history. Better yet, you can ask what series of commands need to be executed in order to recreate a particular file in your workflow. It’s also possible to tag files with potentially useful metadata, and so chitin takes advantage of this by adding the runtime4, and current user to all executed commands for you.

Additionally, I’ve tried to find my own middle ground between the sunblock-esque configurations that yielded superior metadata, and not getting in the way of our users too much. So one may optionally specify handlers that can be applied to detected commands, and captured stdout/stderr. For example, thanks to my bowtie2 configuration, chitin tags my out.sam files with the overall alignment rate (and a few targeted parameters of interest), automatically.

screenshot-from-2016-11-16-17-21-30

chitin also allows you to specify handlers for particular file formats to be applied to files as they are encountered. My environment, for example, is set-up to count the number of reads inside a BAM, and associate that metadata with that version of the file:

screenshot-from-2016-11-16-17-30-55

In this vein, we are in a nice position to check on the status of files before and after a command is executed. To address some of my integrity woes, chitin allows you to define integrity handlers for particular file formats too. Thus my environment warns me if a BAM has 0 reads, is missing an index, or has an index older than itself. Similarly, an empty VCF raises a warning, as does an out of date FASTA index. Coming shortly will be additional checks for whether you are about to clobber a file that is depended on by other files in your workflow. Kinda cool, even if I do say so myself.

Conclusion

Perhaps I’m trying to solve a problem of my own creation. Yet from a few conversations I’ve had with folks in my lab, and frankly, anyone I could get to listen to me for five minutes about managing bioinformatics pipelines, there seems to be sympathy to my cause. I’m not entirely convinced myself that a “shell” is the correct solution here, but it does seem to place us in the best position to get commands entered by the user, with the added bonus of getting stdout to parse for free. Though, judging by the flurry of Twitter activity on my dramatically posted chitin screenshots lately, I suspect I am not so alone in my disorganisation and there are at least a handful of bioinformaticians out there who think a shell isn’t the most terrible solution to this either. Perhaps I just need to be more of a wet-lab biologist.

Either way, I genuinely think there’s a lot of room to do cool stuff here, and to my surprise, I’m genuinely finding chitin quite useful already. If you’d like to try it out, the source for chitin is open and free on GitHub. Please don’t expect too much in the way of stability, though.


tl;dr

  • A definition of “being organised” for science and experimentation is hard to pin down
  • But independent of such a definition, I am terminally disorganised
  • Seriously what the fuck is ../5-virus-mix/2016-10-11__ref896__reg2084-5083__sd1001
  • I think command wrappers and platforms like Galaxy get in the way of things too much
  • I wrote a “shell” to try and compensate for this
  • Now I have a shell, it is called chitin

  1. This is a genuine directory in my file system, created about a month ago. It contains results for a run of Gretel against the pol gene on the HIV genome (2084-5083). Off the top of my head, I cannot recall what sd100 is, or why reg appears before the base positions. I honestly tried. 
  2. Because more things that are not my actual PhD is just what my PhD needs. 
  3. If it helps you, imagine some soft jazz playing to the sound of rain while I talk about this gruffly in the dark with a cigarette poking out of my mouth. Oh, and everything is in black and white. It’s bioinformatique noir
  4. I’m quite pleased with this one, because I pretty much always forget to time how long my assemblies and alignments take. 
]]>
https://samnicholls.net/2016/11/16/disorganised-disaster/feed/ 0 1581
A pretend biologist’s guide to making an EDTA chelating (“buffer”) solution https://samnicholls.net/2016/06/26/edta-buffer-protocol/ https://samnicholls.net/2016/06/26/edta-buffer-protocol/#respond Sun, 26 Jun 2016 22:32:25 +0000 https://samnicholls.net/?p=1036 What the fuck is an EDTA chelating (“buffer”) solution?

EDTA buffer is a chelating agent that inhibits enzymatic degradation of DNA and RNA in a solution.
EDTA buffer is an agent that steals ions from molecular machines that need it to make DNA or RNA in a solution go bad.

What is it for?

Enzymes that modify, degrade and synthesize DNA and RNA usually require magnesium ions. EDTA buffer inhibits such metal-dependent enzymes by sequestering metal ions (primarily magnesium and calcium) from the solution. Thus EDTA buffer is a widely used component in buffers and solutions where there are biological products you wish to maintain the integrity of and/or reactions you may wish to suppress.

EDTA buffer is a component of TE Buffer (Tris/EDTA), a protective storage medium for DNA and RNA, and: TAE Buffer (Tris/Acetic Acid/EDTA) and TBE Buffer (Tris/Boric Acid/EDTA), both used for gel electrophoresis. EDTA buffer is also used as an anticoagulant for the storage of blood and preventing clumping of cells in liquid suspension.

Why do you say “buffer”?

Strictly speaking (as a pretend chemist for less than 24 hours), I thought a buffer was a solution designed to maintain a particular pH, whereas this buffer’s primary purpose is to sequester magnesium to suppress enzymes involved in degrading DNA. I find the term buffer a little confusing in this regard and feel chelator would be a better term. However, I’m horrendously unqualified and all the literature seems to refer to this solution as an “EDTA buffer”, but I just wanted to register my discontent1.

What do I need?

Reagents

  • EDTA disodium salt2 (FW 372.24)
  • Sodium hydroxide (FW 40) pellets
  • Milli-Q water (at least the volume of solution you wish to make)

Equipment

  • A suitable weighing scale (an analytical scale is preferred, but its maximum weight may be too low and you’ll require a top pan balance instead)
  • Realtime pH probe
  • Magnetic stirrer and flea

Glassware

The volume of your required glassware will depend on the volume of EDTA buffer you wish to produce. The values in brackets are the containers suggested for a recipe to produce 0.5M 200ml.

  • A volumetric flask capable of holding the exact target volume of your recipe (200ml)
  • A beaker capable of holding about 50-75% of the target volume of your recipe (250ml)
  • A bottle (Duran Flask) capable of holding the target volume of your recipe (250ml)
  • A bottle (Duran Flask) to store a suitable volume of MQ water (if not already available)

Bits and pieces3

  • Lab spatulas
  • Foil
  • Measuring cylinder (that can contain the volume of solution you wish to make)
  • Funnel
  • Access to an autoclave, and autoclave tape
  • Patience

What are those things?

EDTA

Ethylenediaminetetraacetic acid (widely referred to as EDTA because nobody can spell or say ethylenediaminetetraacetic) is a chelating agent, capable of sequestering metal ions (including calcium, iron and magnesium). The molecule is hexadentate (“six-toothed”): a cool word describing its claw-shaped structure that is capable of binding very strongly and very effectively to an atom in six places.

This feature of EDTA makes it rather ubiquitous and useful in many industries. For example, EDTA softens water to allow ingredients of soaps, shampoos and laundry detergents to work more efficiently. EDTA is also used to preserve and stabilise cosmetics, eye drops and skin care products in the presence of air, prevent discolouration of dyed fabrics in the production of textiles and can be used as a preservative for food (especially to prevent oxidative decolouration). EDTA can be used to treat many instances of heavy metal poisoning (lead, mercury and others) via chelation therapy, binding to heavy metals in the blood for safe excretion through urine. EDTA can also chelate excess iron from the blood, which can reduce the complications of blood transfusions. EDTA is used extensively in the analysis of blood, primarily as an anticoagulant.

EDTA is an essential medicine according to the World Health Organisation.

Sodium hydroxide

CAUSTIC
Sodium hydroxide is caustic to both metals and skin, and can cause serious eye damage.
In the event of exposure to skin, irrigate with water for 10-15 minutes.

Sodium hydroxide (familiarly named caustic soda) is a commonly used alkali with wide industrial applications including; pulping wood in the production of paper, refinement of bauxite ore to aluminium oxide for the production of aluminium, and the manufacture of soaps and detergents. Sodium hydroxide is also widely used in the preparation of foods, including chemical removal of skins from fruit and vegetables, processing of cocoa, poultry and soft drinks. According to Wikipedia, the unique crust of German pretzels and flavour of Chinese noodles are down to their preparation in sodium carbonate and lye-water respectively.

Sodium hydroxide is highly effective as an industrial cleaning agent, capable of dissolving grease, oils and fats. It is a common component of strong oven cleaner, glass and steel degreasers and drain openers – capable of hydrolysing hard to break down proteins in hair.

It is often used in the laboratory to as a means to raise the pH of solutions.

It is also used by serial killers to dissolve bodies4.

Milli-Q Water

Distilled, deionised and filtered water. Ion mass spectrometers will have trouble picking up more than a few parts per million. Really, just, really fucking pure water.

pH probe

An instrument for measuring the pH of a solution by measuring electric potential between two electrodes: the glass and reference electrode. Free ions in the solution (or lack thereof) cause a differential in charge across the inner and outer surfaces of a glass bulb which contains metal salts and surrounds the glass electrode. As the pH on the inside of the glass bulb (that surrounds the glass electrode) is known, pH can be quantified by measuring the differential in conductivity between the inner (glass) and outer (reference) surfaces of the glass membrane.

It is important to calibrate the pH meter before every use (or at at the start of the day, if it is to be used) to ensure accurate readings, as the calibration of the glass electrode will drift through use and time. Calibration should be done with at least two buffers (at either side of the scale of interest). Fancier models will also note the temperature during calibration to correct for variation in pH caused by temperature during actual use later. After use, ensure to follow instructions on cleaning the probe as the electrodes must be kept free of contamination. Typically after use a probe will be rinsed with deionised water, blotted dry and returned to its storage buffer (some form of neutral buffer that does not encourage ions to diffuse out of the electrode).

Solutions like EDTA buffer are only as good as the pH meter that they are buffered with, so it is important this instrument is well cared for. Ensure to follow product guidance for correct cleaning and storage of pH probes in both short and long-term, as these parameters vary between model and manufacturer.

Volumetric flask

A piece of glassware calibrated to contain a precise volume (at a particular temperature), used for precise dilutions and measures of stock solutions. I asked my supervisor why we have volumetric flasks when glass beakers and bottles are typically graduated with volume markings:

“Are they not that accurate then?”
“Awh jesus christ you may as well be measuring everything with your eyes closed!”

How do I make EDTA buffer happen?

Calculate recipe mass

The volume and concentration of EDTA buffer you want to make will depend on what you intend to use it for. For me, I was preparing an EDTA buffer as a component of TAE (Tris/Acetic Acid/EDTA) buffer for gel electrophoresis. A 50x TAE recipe requires 100ml of EDTA buffer at a concentration of 0.5M; a 10x TAE buffer will thus require 20ml. I settled for 200ml as a reasonable volume of EDTA buffer to make; not so much that it will sit in the lab for the next decade5, and not so little that the rather laborious effort will need to be repeated any time soon.

To determine the mass of EDTA required for the recipe, we work out how many moles of EDTA should be dissolved in our buffer to obtain the desired concentration (0.5M) in a given volume (200ml). You can adjust the equation below for your own recipe by altering the desired volume and concentration:

mol_{\text{EDTA in buffer}} = vol_{\text{buffer}} \times conc_{\text{buffer}}\\   \\  mol_{\text{EDTA in buffer}} = 0.2L \times 0.5M\\   \\  mol_{\text{EDTA in buffer}} = 0.2L \times \frac{0.5 mol}{1L}\\   \\  mol_{\text{EDTA in buffer}} = 0.1 mol

A 200ml solution of EDTA with a concentration of 0.5M will contain 0.1 moles. We can now derive the number of physical grams of EDTA that will yield this many moles using its molar mass6 as printed on the label7:

mass_{\text{EDTA for buffer}} = mol_{\text{EDTA in buffer}} \times \text{mol mass}_{\text{EDTA}}\\   \\  mass_{\text{EDTA for buffer}} = 0.1mol \times 372.24g.mol^{-1}\\   \\  mass_{\text{EDTA for buffer}} = 0.1mol \times \frac{372.24g}{1 mol}\\   \\  mass_{\text{EDTA for buffer}} = 37.224g

If you house a distrust for equations, we can also confirm this value with some empirical thinking given the molar mass of EDTA. Considering the molar mass of EDTA is equal to the number of grams required to make a 1 mole per litre (1M) solution, we can derive that the mass for half the concentration (0.5M) will be half the molar mass: 186.12g. For our recipe however, we do not wish to produce a litre, but 200ml. To maintain the desired 0.5 moles per litre concentration in a smaller volume (200ml) we require just a fifth (1L / 200ml = 5) of the already halved mass: independently verifying our value of 37.224g. Mathematics works! It’s important to note that 37.224g will produce a 0.1 moles per 200ml solution, which is another way of expressing 0.5 moles per litre, or 0.5M.

Pre-prep

  • Translate hieroglyphs on the Milli-Q water dispenser and dispense Milli-Q water into a Duran flask

Prepare the EDTA solution

Before you begin weighing, check that the sum of the mass of the beaker (or container) and the required mass of EDTA for your recipe is not likely to be greater than the maximum weight supported by the scale, because that totally didn’t happen to me.
  • Turn on your scale, wait until it is ready (it may beep, or just show zeroes)
  • Place your beaker on the scale, wait for the value to stabilise and tare the scale
Try not to disturb the surface that your scale is on. This includes writing in your lab book, or leaning against or touching the bench. These devices are incredibly sensitive! Analytical scales typically have glass doors to mitigate the effects of air currents (including your breathing), ensure these are closed when tare-ing the device, and reviewing the final weight.
  • Carefully spatula out the required amount of EDTA for your recipe and realise how futile attempting to measure out a mass correct to four decimal places is
Carefully is the operative word here. Any substance spilt or otherwise off-target from your beaker or weighing boat, will still be counted by your scale. To maintain accuracy you should clear up any loose substance once you’ve inevitably made a mess of this.
  • Move the beaker off, or out of your scales, and turn off the scale, transport the beaker to your magnetic stirrer
It’s good practice to cover the top of the beaker with foil if you are going to wander around the lab with it, or abandon it briefly while you work out how a magnetic stirrer works.
  • Add between half and three quarters of your target volume of Milli-Q water (e.g. ~150ml for our 200ml recipe) to the EDTA
The accuracy does not matter here, we’re just providing a solvent to dissolve the EDTA in. We’ll be topping up the solution accurately in a volumetric flask later.
  • Place your beaker on a magnetic stirrer, slip the “flea” into the solution, turn on the magnet and fetch your pH meter
Don’t turn the stirrer on too high, it will cause the flea to stop spinning smoothly and instead bounce around uncontrollably, potentially splashing your solution out of the beaker (or damaging the container).
Don’t panic when your EDTA does not begin to dissolve, EDTA is almost insoluble in water until the pH is increased to pH 8.0

Buffer the solution to pH 8.0

Doing this right takes fucking ages. Don’t start this at the end of the morning because you’ll miss all of the best sandwiches for lunch. Screwing this part up means starting over, so be patient.
  • Remove your pH meter from its storage buffer and calibrate it if necessary (see above)
Your buffers are only as good as their pH, look after your meter!
  • With your solution still stirring, insert the pH meter probe and wait for it to stabilise
  • Begin adding sodium hydroxide pellets one by one; each time, wait for the pH to plateau and stabilise before adding another
You’ll need approximately 18-20g of sodium hydroxide pellets per litre (so ~3.5-4g for 200ml)
Sodium hydroxide pellets are hygroscopic: absorbing moisture from their surroundings. Do not leave the container open to the air. Weigh out perhaps half as many pellets as you think you may need to prevent them all degrading too quickly. Do not worry about the pellets you are adding to the solution becoming wet and sticky, but do be aware that this small amount of moisture will compound across all pellets to increase the overall volume of the solution.
Sodium hydroxide is caustic, handle with care, especially once the pellets have absorbed moisture.
Do not just dump all your pellets into the solution, if you overshoot pH 8.0, you can’t just add an acid or more EDTA to bring the pH back the other way – you’ll have “used” some of the solution’s buffering capability, you’ll have to throw it down the waste sink. I totally didn’t do this either.
EDTA buffer is classed as non-hazardous waste and can disposed down the waste sink. Check the rules for your lab and don’t assume this is the case with other chemicals as it’ll probably get you fired, or kicked out of the lab.
As you add sodium hydroxide, the pH of the solution will increase, only to allow more EDTA to dissolve, reducing the pH again. Your solution is trapped in this knife-edge two-steps-forward-and-one-back pH push-pull game and is why the process takes so long.
You can also add sodium hydroxide solution with a pipette, particularly if you want more control over the process as you near the goal of pH 8.0. However, don’t forget that both sodium hydroxide solution and the water absorbed by the sodium hydroxide pellets will add to the overall volume of your solution. This volume must not exceed (and ideally should be comfortably below) the volume of buffer you intend to make in total.
  • As you add more sodium hydroxide pellets, the pH should slowly wave towards pH 8.0, at which point almost all of the EDTA should have dissolved
  • Once you have finally stabilised the solution as close to pH 8.0 as you dare, leave the stirrer and pH meter on for a few more minutes to verify your work
  • If you are sure and happy with the pH, turn off the pH meter, clean the probe (as per instructions) and return to its storage buffer solution
  • Turn off the stirrer, retrieve the flea, rinse, and return it to the flea storage box before it gets misplaced

Accurately raise to target volume

  • Pour the contents of the beaker into the volumetric flask
  • Rinse the beaker with a small volume of Milli-Q water, swirl and add this to the volumetric flask (repeat until the volumetric flask is filled to the graduation mark)

Sterilise

  • Transfer the contents of the volumetric flask to a suitably sized Duran flask
  • Give the Duran flask a suitable label (0.5M EDTA Buffer, your name, lab, today’s date, etc.)
  • Add a strip of autoclave indicator tape
  • Loosen the lid of the flask and sterilise the solution in the autoclave
Your vessel can shatter under the pressure of autoclaving if you leave the lid on tight. Once removed from the autoclave, do not seal until the solution is cool, as this may also cause your vessel to shatter.

What do I do now?

  • Tighten the lid only once the solution has cooled after autoclaving (to prevent the flask shattering)
  • Store at room temperature
  • Open and pour inside a laminar flow hood only, it’s a sterile product now!
  • Put a picture of the bottle on Twitter, well done, you are a god damn alchemist now

How do I fuck it up?

  • Waste time using a scale whose maximum weight limit is exceeded by your container and EDTA
  • Impatiently dump all your sodium hydroxide pellets into the EDTA+Milli-Q solution, ruin the buffer by overshooting pH 8.0 and have to start over
  • Use a poorly maintained or uncalibrated pH meter: the pH is probably the most important property of this product!
  • Blow up your hard work by leaving the lid on tight and exposing it to multiple atmospheres in the autoclave

  1. I suppose “EDTA chelator” makes it sound like it might chelate EDTA, which would only serve to cause more confusion. 
  2. EDTA seems to be most commonly distributed in a disodium salt form. Apparently it can be quite difficult to achieve full dissolution of EDTA in its disodium form and the tri- and tetra- salts are more readily dissolved. 
  3. These parts eventually become obvious, apparently. 
  4. See also, hydrofluoric acid occasionally used to dissolve bathtubs
  5. 200ml should provide enough EDTA chelator for 10L of 10x TAE buffer. Each litre of which will yield 10L of 1x TAE, which is in turn enough to fill a reasonably sized gel tank and make at least ten small gel slabs. This is a lot for just me, but in a communal lab, I’m sure we’ll find a use for my homemade TAE. 
  6. Which, rather helpfully, can be represented by the same fucking symbol as molar: M
  7. To confuse you more, it may be under Formula Weight, FW, Molecular Weight or MW. I am under the impression there are subtle differences between these terms that I haven’t quite got to the bottom of yet. 
]]>
https://samnicholls.net/2016/06/26/edta-buffer-protocol/feed/ 0 1036
A pretend biologist’s guide to running a PCR (polymerase chain reaction) https://samnicholls.net/2016/06/21/pcr-protocol/ https://samnicholls.net/2016/06/21/pcr-protocol/#respond Mon, 20 Jun 2016 23:39:02 +0000 https://samnicholls.net/?p=966 What the fuck is PCR?

PCR is a laboratory protocol for generating significant numbers of copies of a subsequence of a DNA template, via repeated exposure to an enzyme capable of synthesizing molecules of DNA.

How does it work?

  • Short nucleotide sequences called primers are specifically made-to-order to bind (complement) to the start and end of a sequence of interest (for some extracted DNA)
  • DNA (the template) is prepared with those primers, dNTPs (basically loose nucleotides) and a polymerase enzyme
  • Prepped solutions of between 20-50ul are inserted in a thermal cycler: a machine that alters the temperature to control an exponential DNA amplification via repeated cycles (typically around 30) consisting of three stages:
    • Denaturing template DNA melted at high temperature to yield single strands (exposing strands to primers)
    • Annealing temperature lowered1 to anneal primers to single stranded template
    • Extending2 polymerase binds to both ends of primers and begins to elongate a new strand with the floating dNTPs
  • Each cycle creates two new double stranded DNAs from each molecule of DNA present, theoretically the DNA product doubles with each cycle3
  • Typically can amplify subsequences of template up to 10kbp

What do I need?

Reagents

  • Ice box containing:
    • Taq polymerase
    • Taq buffer
    • Template DNA
    • dNTP mix
    • Forward and reverse primers
  • Fridge box containing:
    • HPLC-grade water

Equipment

  • Vortex
  • Centrifuge
  • Thermal Cycler
  • Access to HPLC-grade water dispenser and UV Crosslinker

Bits and pieces4

  • P2, P10, P100, P200, P1000 pipettes (and appropriate tips)
  • Eppendorf tubes (500ul, 1.5ml)
  • PCR tubes (200ul) + caps (if not attached)
  • Multi-format tube rack
  • Waste bin
  • Gloves

What are those things?

Taq polymerase

A highly thermostable polymerase enzyme (a molecular machine for assembling long chains of nucleic acids) isolated from (and named after) the Thermus aquaticus bacterium; an extremophile that is capable of thriving in high temperature environments (favouring 70°C, but tolerating anything between 60-80°C). Polymerase drives the elongation or extension process of PCR. In the late 1980s, it was discovered that polymerase isolated from Thermus aquaticus could actually withstand the temperatures involved in the annealing step where DNA is melted into its two strands. The polymerase was refined and mass produced for commercial sale; now PCR could be completed without re-adding a polymerase at the end of every cycle!

Taq buffer

PCR buffers attempt to maintain optimal conditions for the activity of polymerases during PCR. Various ingredients can chelate ions that are required for enzymatic activity to reduce degradation of reagents, and unwanted reactions.

Template DNA

Your already extracted and purified DNA sample that contains some sequence that you desire to amplify.

dNTP Mix

Named so as deoxynucleoside triphosphate doesn’t roll off the tongue so well. dNTP mix is essentially a grab bag of the four nucleotides. During the elongation cycle of PCR, polymerases utilize free dNTPs to synthesize new chains of nucleic acids to create complementing strands.

Primers

A pair of short sequences (15-30bp) of nucleic acids designed to complement two ends of a target subsequence of interest on your template DNA. Good primers are 40-60% GC-content, have similar annealing temperatures and should not be self-complementary, or complementary to another primer in the mix.

HPLC-grade water

High-performance liquid chromatography (HPLC) is a technique to identify and separate individual components of a mixture. HPLC-grade water is deionized, filtered, UV-filtered and in general, pretty fucking clean. The goal is to prevent contamination of reagents with nucleases.

How do I make the PCR happen?

Pre-prep

  • Gather equipment, ensure your reagents are not depleted, check whether someone has stolen the power lead for the thermal cycler
  • Place tube racks in freezer to keep them cold (this helps maintain the integrity of reagents)
  • Collect HPLC water (if necessary) and run through UV crosslinker to denature any residual proteins

Prep

  • Retrieve tube racks from freezer
  • Move dNTPs, primers and template from ice box onto tube rack to thaw, it is essential that these are returned to the ice box as soon as possible once fully thawed

NEVER allow Taq Polymerase to reach room temperature.

Ensure reagents have fully thawed to avoid aspirating solutes of incorrect concentrations.

  • Briefly vortex and centrifuge (a few seconds at ~5-10Krpm) Taq buffer and dNTP mix

The buffer must be vortexed to ensure its components are mixed thoroughly.

Prepare a working dNTP mix (if required)

dNTP mix is often shipped at a high concentration (100mM) and in such cases must be diluted to a more practical “working mix” before it is practical to pipette into PCR tubes. This also prevents having to repeatedly freeze-thaw your master mix.

  • Calculate the volume of master dNTP mix required to create a more practical solution; say 250ul at a concentration of 2mM:
    vol_{\text{from master}} = \frac{vol_{\text{desired}} \times conc_{\text{desired}}}{conc_{\text{of master}}}\\  vol_{\text{from master}} = \frac{250ul \times 2mM}{100mM} = \frac{500}{100} = 5ul
  • Aspirate and dispense the solvent first (it is easier to pipette a small volume into a larger one). For our 250ul working mix that contains 5ul of the master mix, we must dispsense 245ul of HPLC water into a 1.5ml Eppendorf tube.
  • Vortex and centrifuge the dNTP mix briefly if you have not already done so
  • Add 5ul of master mix to the new working mix tube
  • Aspirate and dispense repeatedly and carefully to wash the pipette tip and mix the new solution
  • Return the 100mM master mix and new suitably labelled 2mM working mix to the ice box

Preparing the PCR tubes

  • Lay out the necessary number of required PCR tubes on a cold rack
  • Calculate all necessary dilutions before you begin pipetting (consider your protocol parameters; reaction size, desired dilutions of template, primer and dNTP mix):
    • dNTPs
      vol_{\text{from working}} = \frac{vol_{\text{PCR reaction}} \times conc_{\text{for PCR}}}{conc_{\text{of working}}}\\        vol_{\text{from working}} = \frac{50ul \times 0.2mM}{2mM} = \frac{10}{2} = 5ul
    • Primers (forward and reverse)
      vol_{\text{of primer}} = \frac{vol_{\text{PCR reaction}} \times conc_{\text{for PCR}}}{conc_{\text{of primer}}} = Xul

      Although the protocol specification requires a final concentration of between 0.1-1.0uM of each primer, it seems that in general (your mileage will vary), people tend to add excess to give a final concentration up to 2uM. For example 2ul of a 50uM (50pmoles/ul) working primer solution.
    • Template
      vol_{\text{of DNA}} = \frac{vol_{\text{PCR reaction}} \times conc_{\text{for PCR}}}{conc_{\text{of DNA}}} = Zul
  • Remove Taq Buffer from ice box and pipette the volume required by your protocol (my protocol stated 10ul) into all tubes, return the temperature-sensitive Taq Buffer to ice (or the freezer) as soon as possible

It is highly recommended that Taq Buffer is the reagent to be added first. As a buffer, it is responsible for preventing unwanted enzymatic activity such as denaturing (or early annealing) of template DNA and primers.

  • The rest of the reagents can be added in no particular order, with the exception of Taq Polymerase, which comes later:
    • dNTPs
    • Primers (ensure both forward and reverse primers are added)
    • Template DNA
  • For each tube, sum the volumes of its reagents (don’t forget the polymerase, which is not yet in the tube) and subtract this total from the target volume required by your protocol (again, here, 50ul)
  • Add those amounts of HPLC water (the volume to add may differ between tubes if differing volumes of primer or template were added) to bring up the total volume of each sample tube to the target volume (less the polymerase)
  • Ready (switch on and program) the thermal cycler before adding Taq Polymerase (reactions begin as soon as it is added, albeit at room temperature)
  • Remove Taq Polymerase from ice, vortex gently and centrifuge briefly to remove excess from the walls of its tube5
  • Add between 0.5–2.0 units of polymerase per 50ul reaction, our protocol recommended 1.25u, I added slightly more to make it easier to aspirate with a pipette, return to ice (or freezer) as soon as possible
  • Seal PCR tubes (close lids or seal caps6)

Thermal Cycling

  • Load tubes into thermal cycler immediately (your reaction has begun!)

If you are feeling particularly prepared, you could preheat the lid of your cycler to ensure the hot-start PCR begins more quickly.

  • Ensure lid is as tight as possible (if it has a lid that needs manual tightening to push the heated block7 against the tops of tubes)
  • Load and check program schedule (does it have a sensible run time? Has someone in your lab accidentally sabotaged it in the last five minutes?)

Ensure there is an infinite store step at less than 5°C following the end of the final cycle of your program. Unless you want all of your work destroyed at room temperature. This is especially important if you are running PCR before going home for 12 hours.

  • Run program!
  • Watch in horror as you allow the machines to take over everything and probably ruin your experiment

What do I do now?

  • PCR product must be kept in the fridge or freezer
  • Verify fragments of the expected size (or anything at all) were amplified with gel electrophoresis

How do I fuck it up?

There are a multitude of ways that PCR can fail. Due to the number of reagents required in each tube, and lots of pipetting it is quite trivial to make a mistake. Helpfully, it is typically not possible to establish the cause and the process must be repeated. Lots of attention to detail is required, especially if there is more than one template, or more than one set of primers that make up individual reactions.

  • The easiest way to fuck up PCR is to forget something, each PCR tube needs all of the components and even the smallest distraction can cause you to skip, forget or duplicate a step
  • Applying the wrong template or primer to a particular reaction tube
  • Fucking up your mathematics (usually by forgetting to ensure all the molarity or volume units are the same) for dilutions, yielding low concentration of a reagent (in general it seems PCR is more prone to failure under conditions where reagents are in concentrations that are too low, rather than too high — with the exception of buffer8).
  • Contaminate your reagents by forgetting to change your pipette tip, this is particularly bad as it destroys your expensive reagent to boot
  • Suboptimal annealing temperatures were used during thermal cycling (consult the documentation for your primer set)
  • Your template DNA is damaged (poor storage, bad preparation)
  • Your primers suck (or are damaged, or perhaps one of the pair binds to another location)
  • You might have forgotten to vortex your Taq buffer before use and it happened to have separated during storage
  • Adding polymerase first might have caused the reaction to start too early
  • Water was not sterile
  • The day ends in a ‘y’

  1. Specific annealing temperatures depend on the primers and a recommended temperature would be provided by the manufacturer. 
  2. The rule of thumb is the elongation step should last 1min/kbp of desired target sequence. 
  3. The first five cycles are critical for this reason. 
  4. Also known as “the fucking obvious” to your typical microbiologist. However, I have not even held a pipette before, so it seems nice to have an explicit list of everything that one needs. 
  5. This is pretty important as Taq polymerase is pretty fucking expensive, you want to minimize the amount of it that gets caught on the outer wall of your pipette tip. 
  6. If using PCR tubes that require caps, ensure each of them click into place and are tightly sealed. Any gaps will cause the contents of the PCR tube to evaporate into the cycler, which is somewhat problematic as you can’t run a gel without any product. On the light side, you at least know the cause of an empty (or near empty) tube. 
  7. The heated lid prevents your sample evaporating and condensing on the lid (leaving your sample with less water). Older thermal cyclers lack this feature and require you to add a small layer of oil on top of your reaction mix instead. 
  8. In particular, adding too much buffer (by not diluting enough) will leave too much magnesium in your reaction and cause your polymerases to be “promiscuous”; binding to primers that are not specifically bound to template and elongating strands that are undesired. 
]]>
https://samnicholls.net/2016/06/21/pcr-protocol/feed/ 0 966