GEP header

Genomics in the News

Genomics has (once again) upended a well-established notion in human evolution.

Jeffrey Gordon and his lab at Washington University in St. Louis have just published a study in Science examining the role of gut microbiota in malnourishment. It's just the opening salvo in figuring out what's going on, but it hinges on a very interesting question: How can one twin be malnourished when the other, with the same diet, be healthy?

Sequencing the genomes of a wide variety of pigeon breeds is beginning to give some interesting insights into the genetic processes behind the evolution of morphological variation (and the pictures of pigeons are cool, too).

As we get increasingly closer to "cheap" genome sequencing — sequencing a human genome for less than $1000 is still the entirely reasonable goal — issues of analysis and privacy are motivating companies to come up with solutions.

More genomics in history: DNA testing has confirmed that Richard III, king of England in the 15th century, was buried in Leicester beneath what is now a parking lot. Although what is rapidly becoming a commonplace, if not trivial, application of the technology, still — we're talking about one of the more (in)famous English kings here!

Studies of the genomes of various peoples have provided insights into a range of questions about early man's migrations. Recently a team of researchers at the Max Planck Institute for Evolutionary Anthropology published a study supporting a hypothesis that sailors from India arrived in Australia about 4000 years ago and interbred with the native Aborigines. This also suggests that the Australian dingo, which exterminated the native Australian thylacine, may have descended from Indian dogs brought to the island by those sailors.

This isn't exactly genomics, but it's very cool: Drs. Goldman and Birney of the European Bioinformatics Institute have proposed a way to store data in the form of DNA which would enable us to "fit all the world's digital information into the back of a lorry." Why do this? As the successful sequencing of DNA from Neanderthals and mammoths has shown, DNA is an incredibly stable storage medium even under very harsh conditions.

A pair of genome researchers at Harvard have published a study on a new class of molecules called lincRNAs, suggesting that these play a key role in evolution and speciation. These RNAs differ between species significantly more than the mRNAs coding for proteins, and this higher degree of species specificity has some intriguing implications for evolutionary theorists.


As DNA sequencing technology becomes cheaper and more efficient, the amount of sequencing data available to biologists is exploding. Not only has this allowed us opportunities to compare gene structure in many diverse organisms, but also to study and compare the structure of entire genomes. The sheer volume of information we are now coping with is unprecedented, and new tools and ways of thinking are not only possible, but increasingly necessary. This course will be an exploration of the techniques used to create genomic DNA libraries, to sequence the resulting DNA fragments, and to analyze the sequences of these fragments, at both the gene and genome levels. Students will gain familiarity with the computer programs used to assemble and annotate genomic sequence data as they use them to analyze their own raw data from the Washington University Genome Sequencing Center. This course will be extensively computer-based. We will be working with large (ca. 40-kb) sections of genomic DNA in silico: by the end of the semester, each student will have finished improving the sequence quality of one of these 40-kb clones to a publishable level and extensively annotated another, indicating the locations of genes, repeat sequences, and other sequence motifs.

"Finishing" in the field of genomics means to take a partially assembled sequence and correct errors in assembly and sequence to a set level of accuracy; we will be improving our sequences to the NHGRI "mouse standard" (fewer than 1 error per 1000 basepairs). This process will involve:

  1. learning to use Consed, a software package used by professional sequence finishers
  2. examining and evaluating sequence fragments obtained from The Genome Institute at Washington University
  3. reassembling computer-derived errors as appropriate
  4. determining which regions are of inadequate quality and need to be re-sequenced
  5. selecting reaction parameters and informing The Genome Institute, which will carry out those reactions and return the results to you
  6. incorporating the new data and returning to step 3

As you can see, this is an iterative process, and will take several cycles to complete. By the end of it you will have improved your chosen sequence significantly, will have become familiar with the logic and methodology of sequence improvement, and will have prepared both a written and an oral presentation of your finishing work.

Annotation is the process of determining the positions of every significant structure in a particular region of DNA. We will review eukaryotic gene structure and the structures and functions of repeated sequence elements before examining individual projects, sections of unannotated genomic sequence roughly 40,000 bases in size (a "contig"). Your goal will be to accurately identify the location of every open reading frame, every intron/exon splice junction, and every repeat sequence in your contig, as well as homologies to genes in other species. You will evaluate patterns of synteny (are homologous genes found in the same order and arrangement in other species, or have they become "scrambled" over evolutionary time?) and construct a phylogenetic tree for one of the genes in your contig as well.

You will learn several software programs to carry out various aspects of this work, over the course of the second half of the semester, completing the annotation of your contig in that time as well as preparing both a written and an oral presentation of your annotation work.


Classes are held in Memorial 201 (the Mac lab) for the first half of the semester; for the second half we'll be in PPHAC 331.

We meet Monday and Wednesday evenings, 6:30 pm to 9:30 pm

Accessing the Lab

Feel free to work on your projects whenever you'd like outside of class hours. Memorial 201 is accessible until the building is locked up by Campus Safety at approximately 9 pm. (If you need to get in after they've locked up, call Campus Safety and they will let you in.) Do note however that there is a class held in Memorial 201 on Thursdays from 4 pm to 7 pm.


The text for this course is Introduction to Genomics, 2nd ed., by Arthur Lesk (Oxford University Press). Online resources available for the book can be found here. In addition, readings from the primary research literature will be assigned as the semester progresses.

Finishing fosmid map

Here's a map of the contigs I've selected for us to work on this semester; they're the red ones (of course) in the circles:

finishing contigs spring 2013