I recently developed a module for teaching some basic ideas about genome assembly for an assignment in “Teaching College Biology”. The module involves a short intro to genome assembly as well as the various platforms presently available (I characterize them in 3 categories: short, long, and ultra-long).

The students then form small groups and attempt to assemble a small sequence in the form of a famous speech (MLK’s “I have a dream” or Winston Churchill’s “We shall…”). Each speech has a repetitive element that makes assembly difficult, illustrating the effect of input data on assembly success.

On my github (you can find a few slides, the two text files (mentioned above) and a script to randomly generate a given number of illumina or pacbio “reads” from the text files.