Draft Genome Annotation

Annotating a draft de novo genome assembly of a previously-uncharacterized species with gene function information is a challenging problem. Having the genome sequence is not biologically-important until you know what the sequence does, in terms of what proteins are expressed/translated. Genome->transcriptome->protein sequence information is collected experimentally and must be integrated. Some pipelines include using annotated genomes and transcriptomes, proteomes from closely-related species. There is not one clear set of methods, and multiple pipeline tools are available. A question is, which is the best to use? Right now, we should try multiple methods and see which combination of results make the most sense.


1. CEGMA (Core Eukaryotic Genes Mapping Approach), which takes a set of 458 proteins found to be highly conserved among Eukaryotes.

2. EuGene, a software with gene prediction modeling that promises integration with RNAseq or EST data, not straightfoward to learn how to use. Parameters are not easy to follow and require integration with plugins.


3. The PASA pipeline is another method to try:



About Lisa Johnson

PhD candidate at UC Davis in Molecular, Cellular, and Integrative Physiology
This entry was posted in Bioinformatics. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s