As part of a PhD it is anticipated1 that you will share your science with various audiences; fellow PhD students, peers in the field and the various publics. Every year, the university celebrates British Science Week with a Science Fair, inviting possibly the most difficult public to engage with: children. Over three days the fair serves to educate and entertain 1700 pupils from over 30 schools based across Mid Wales, and this year I volunteered2 to run a stand.
How to explain assembly?
I was inspired by Amanda’s activity for prospective students at a visiting day a few weeks prior. To describe the problem of DNA sequence assembly and alignment in a friendly (and quick) way, Amanda had hundreds of small pieces of paper representing DNA reads. The read set was generated with Titus Brown’s shotgunator tool, slicing a few sentences about the problem (meta!) into k-mers, with a few errors and omissions for good measure. Visitors were asked to help us assemble the original sequence (the sentences) by exploiting the overlaps between reads.
I like this activity as it gives a reasonable intuition for how assembly of genomes works, using just scraps of paper. Key is that the DNA is abstracted into something more tangible to newcomers – English words building sentences – which is far simpler to explain and understand, especially in a short time. It’s also quite easy to describe some of the more complicated issues of assembly, namely errors and repeats via misspellings and repeated words or phrases.
— Sam Nicholls (@samstudio8) February 10, 2016
A problem with pigeonholing college students?
Yet to my surprise, the majority of the compscis-to-be were quite apprehensive of taking on the task at the mere mention of this being a biological problem, despite the fact that sequence alignment can be easily framed as a text manipulation problem. Their apprehension only increased when introduced to Amanda’s genome game; a fun web-based game that generates a small population with a short binary genome whose rules must be guessed before the time runs out. A few puzzled visitors offered various flavours of “…but I’m not here to do biology!”, and one participant backed out of playing with “…but biology is scary and too hard!”. In general the activities had a reasonable reception but visitors appeared more interested in the Arduinos, web games and robots – their comfort zone, presumably.
One need not necessarily be an expert in biology (I’m certainly not) to be able to contribute to the study of computationally framed questions in that field. As mentioned, DNA alignment is effectively string manipulation and those strings could be anything! Indeed this is even demonstrated by our activity using English sentences rather than the alphabet ACGT.
From experience, undergraduates (and apparently college students) appear keen to pigeonhole themselves early (“…dammit Jim I’m a computer scientist not a bioinformatician”) via their prior beliefs to the meaning of “computing”, and their module/A-level choices. I think it is at this stage where subjects outside one’s choices become “scary” and fall outside one’s scope of interest — “…if I wanted to learn biology why would I be doing compsci?”. Yet most jobs from finance to game development will require some domain specific knowledge and reading outside computing, whether its economics, physics or even art and soundscape design.
This is why it is important as a computer science department that we introduce undergraduates to other potential applications of the field. It’s not that we should push students to study bioinformatics over robotics, but that many students can easily go on unaware that computing can be widely applicable to research endeavours in different fields in the first place. Though to combat the “this is not my area” issue, in our department, many assignments have a real-world element, often just tidbits of domain specific knowledge that force students to recognise the need for base understanding of something outside of their comfort zone.
Lego: a unicorn-like universal engagement tool
College students aside, I needed to work out how to engage schoolchildren between the ages of 10-12 with this activity. Scraps of paper would be unlikely to hold the attention of my target age group for long. I needed something more tangible and less fiddly than strips of paper. It was while describing the problem of introducing these “building blocks of nature” to kids in a simple way when the perfect metaphor popped into mind: Lego.
Yes! A 2×2 brick can represent an individual nucleotide, and we can use different coloured bricks to colour code the four nucleotides (and maybe another for “missing” if we’re feeling mean). A small stack of bricks builds a short string of DNA to represent a read. The colour code effectively abstracts away the potentially-confusing ACGT alphabet, making the alignment game easier to play (matching just colours, rather than symbols that need parsing first) and also quite aesthetically pleasing.
The hard part, was sourcing enough Lego. I returned to my parents’ home to dig through my childhood and retrieve years worth of collected pieces, but once back in Aberystwyth I was surprised to find that after sorting through two whole boxes I did not own more than some 100 2×2 bricks (and most were not in colours I wanted). Bricks, it appears, are actually quite hard to come by! I put out a request for help on the Aber Comp Sci Facebook group and a lecturer kindly performed the same sort with his children’s collections. Their collection must have been more substantial and yielded 150-200 bricks in a mix of four colours, saving my stand.
The activity itself is simple and needs nothing other than some patter, the Lego and a surface for kids to align the pieces on. I spent more time than I would like to admit covering a cardboard box with tinfoil to create the SAMTECH SEQUENCER 9000 (described by Illumina as “shiny”), a prop to contextualise the problem: we can’t look at whole genomes, only short pieces of it that need assembly.
Of course, we’d need some read sets. To make these, I divided the available bricks into two piles, Nathan and I then each ad-libbed sliding k-mers of length 5 (i.e. each stack would have stacks with overlaps of length 4, 3, 2 and 1 coloured brick – which each had their own overlaps…) to build up an arbitrary genome to recover. Simple!
Running the activity
Once doors opened, there was no shortage of children wanting to try out the stand. I think the mystery of the tinfoil box and the allure of playing with Lego was enough to grab attention, though Nathan (my lovely assistant) and I would flag down passers-by if the table was free. Pupils were encouraged to visit as many activities as possible by means of a questionnaire, on which each stand posed a scientific question that could be answered by completing that particular stand’s activity. Unfortunately for us, our stand’s question was not included on the questionnaire (I guess we submitted it too late) but luckily, we found pupils were keen to write down and find an answer to our “bonus question” after all.
We quickly developed a double-act routine; opening by quizzing our aligners on what they knew about DNA, which was typically not much, though it was nice to hear that the majority were aware that “it’s inside us”. Interestingly, of the pupils who responded in the positive to being asked what DNA was, their exposure was primarily from television – specifically when used for identification of criminals. Nathan would then explain that if we wanted to look at somebody’s DNA, we would take a sample from them and process it with the shiny tinfoil sequencer. This special machine would apply some
magic science and produce short DNA reads that had to be pieced back together to recover the whole genome.
At this point we’d invite participants to open the lid of the sequencer and take out a batch of reads (of a possible two sets) for assembly. We’d explain the rules and show some examples of a correct alignment: sequences of matching runs of colour between two or more Lego stacks. Once they got the hang of it, we’d leave them to it for a little while. The two sets meant that we could split larger groups into pairs or triplets to ensure that everybody had a chance to make some successful alignments.
As the teams came to finishing alignment of the most obvious motifs (Nathan and I both accidentally made a few triplets of colours that resembled well known flags in our read sets – which was handy), progress would begin to slow and a few more difficult or red-herring reads would be left over, and Nathan or I would start narrating the problem, asking teams if this had been more difficult than expected. I don’t think any team agreed that the activity had been easy! We used this as an opportunity to interrupt the game to frame how complicated assembly is for real sequences and reveal the answer to our question.
This was my favourite part, I’d hold up one of the Lego stacks and pull it apart. “Each of these bricks is a single base, stacked together they make this read which tells us a what a small part of a much longer genome looks like”. I’d then ask how long they imagine a whole human genome might be. Answers most frequently ranged between 100 – 1000, a minority guessed between 4 – 15. No pupil ventured guesses beyond a million. For the very small guesses, I’d assemble a Lego stack of that length and ask if they still thought the differences between us all could be explained by such a short genome – nobody changed their mind3.
The look on their faces when I revealed it was actually three billion made the entire activity worth it. If we had enough Lego to build a genome, it would be 28,800km tall and stretch into space far beyond where global positioning satellites are in orbit. I’d explain that when we do this for real, the stacks aren’t five bases long, but more like a hundred, and instead of the handful of reads we had in our tinfoil sequencer, there were millions of reads to align and assemble. They’d gasp and look around at each-other’s faces, equally stunned. We even had some teachers dumbfounded by this reveal. “This is why computers are now so important in biology, this would be impossible otherwise!”. We’d clear up any last questions or confusions and thank them for playing.
I would not consider our first group a rallying success. I was not ready for how difficult assembly of a set of unique 5-mers would be. The group had significant trouble recovering the genome and as it turned out, Nathan and I did too. The situation had not been helped by the fact that the group had also taken a mix of reads from both batches in the tinfoil sequencer. As it turns out, even trivial assembly is really hard. I could tell the kids were somewhat disappointed and the difficulty of the game had hampered their enjoyment. We recovered by wowing them with facts about the human genome and they asked some good questions too. Once they left the table, Nathan began the patter with the next group as I hurriedly worked to reduce the number of red-herring reads and recycle the bricks to create duplicate reads which allowed groups to make progress more quickly at the beginning (and effectively turned difficulty into a ramp, rather than uniformly hard to play). This improved further games considerably.
I was surprised how happily the pupils were to append our fairly long question to an already quite lengthy questionnaire, and how keen they were to find the answer, too. Not a single pupil was put off from our activity at the mention of biology, DNA or even unfamiliar terminology like “sequencer”, or “read”. Fascinatingly, Amanda also ran the aforementioned genome game and it was a hit. I guess primary school students are just open to a very wide definition of science and are yet to pigeonhole themselves? Activities like this at an early age have the potential to massively influence how our next generation of scientists see science as a large collaborative effort, skills can be transferred and shared to solve important and interesting questions. The pupils simply had no idea that computers could be used like this, for science, let alone biologically inspired questions.
In general the activity went down very well, the kids seem to get the concept very quickly and also understood the (albeit naive) parallel to DNA. I think they genuinely learned a thing or two (the human genome is big!) and enjoyed themselves. I’m pleased that we managed to draw and keep attention to our stand, given we were wedged between a bunch of old Atari consoles and a display of unmanned aerial vehicles.
I was definitely surprised at how much I enjoyed running the stand too. I’m not overly fond of children and was expecting to have to put on a brave face to deal with tiny disinterested people in assorted bright sweaters all day. Yet all but one or two pupils were happy to be here, incredibly enthusiastic to learn, asked great questions (sometimes incredibly insightful questions) and genuinely had a nice time and thanked us for it. Enjoyment aside, I took the second day off as I’d also found running the activity over and over, oddly draining.
If I were to run this again, I’d like to make it a little more interactive and ideally give players a chance to actually use Lego for its intended purpose: building something. Thankfully at our stand, students were not particularly disappointed when our rules stated that couldn’t take the reads apart, or put them together (i.e. couldn’t actually play with the Lego…). To improve, my idea would be to get participants to construct a short genome out of Lego pieces that can be truly “sequenced” by pushing it through some sort of colour sensor or camera apparatus attached to an Arduino inside a future iteration of the trusty SAMTECH Sequencer range. Some trivial software would then give the player some sort of monster to name4, print off and call their own.
To run the activity again in its current form, I think I’d need to have more Lego. However, it turns out that packs of 2×2 bricks in one colour are widely available on eBay and Amazon, though aren’t actually that much cheaper than ordering via the “Pick a Brick” service on the canonical Lego website. I’ve ordered a few packs (at an astonishing £0.12 per brick) as I would like to try and run this activity at other events to spread the sheer joy that bioinformatics can bring to one’s afternoon.
To give the current version of the game a little more of a goal, it would have been ideal to explain the concept of a genomic reference and have the players align the reads to that (as well as eachother), in effect this would have been like solving the edges of a jigsaw and given a sense of quick progress (which means fun) and also afford us the opportunity to explain more of the “real science” behind the game. To make the game more difficult, we could have properly employed “missing bases” and the common issues that plague assembly including repeats (which is easier to explain with a reference), as well as errors. After the first group at the Science Fair, I quickly removed the majority of sneaky errors as it made the game too “mean” (where Nathan or I had to explain “No that one doesn’t go there!” too frequently).
Some proof what I did public engagement5
— Hannah Dee (@handee) March 15, 2016
— Aber Uni Comp. Sci. (@AberCompSci) March 15, 2016
- Actual Lego bricks are hard to come by (unless you just buy them)
- Typical ten year olds are not as dumb or as apathetic to science as one might expect
- Assembly is actually pretty hard
- Engaging with children with science is exhausting but surprisingly rewarding
- Acquire more Lego
- It’s very hard to tinfoil a cardboard box nicely
- Read, required. ↩
- Read, was coerced. ↩
- With a single Lego brick in hand, one kid looked me dead in the eye and said “Yeah!” when asked if this single base could explain the differences between every human on Earth. ↩
- Genome McGenface? ↩
- Absolutely not using this to pass my public engagement module. ↩