UC Davis, Genome Center
“Using C. elegans to discover functions of conserved unknown human genes.”
Titus: Erich is “not content with just a description of a genome”, goes deep into the conservation of gene families
How much of the human genome is conserved, deeply across metazoans? Are there genes with unknown functions that are doing something important that we’re not aware of?
- What genes are conserved?
- What genes are unknown?
We need practical definitions of both sets to answer these questions.
******* Evolution is giving us a glimpse of what is important.
Summary of literature going into this (summarized on one slide, not many studies).
Pandley et al. 2014 “Ignorome”, show quantitative significance in activity but are not well-studied. Genes that are well-studied when they have more papers, and more people are studying them! (basically no relationship between under-studied genes and biological importance…)
Categories of ways to characterize gene functions (no perfect method):
- sequence similarity
- guilt by association
- metabolic modeling (orphan enzymes)
- chemical proteomics
(bold to indicate importance in future as technologies become easier)
C. elegans as model metazoan. Under 1,000 cells.
Methods for characterizing unknowns,
Used existing human-worm homolog sets: PFAM (domain oriented), PantherDB (precomputed mappings, HMM adaptability), TreeFam (trees)
Surprising findings with deep divergences.
Gene nomenclature is erratic, hard to map. Protein databases has unique and stable protein ID. “You would think genes would be this easy, but they’re not.” Solution is to get your genes to connect to UniProt.
Quantifying unknowns. Define known by sum of annotation densities per gene (or family)= annotation density
Most characterized set of genes is protein kinases. Least is something with no names.
If there is an unknown in humans, trick is to see whether there are any other mentionings of this in any other organisms.
Provides richness and expansion to what people think they already know.
“Cinderella” genes, starts out in obscurity then becomes famous.
They’re there. They’ve been there since the Cambrian. What are they??????????
Found ~30 gene families in humans conserved and unknown. What are these proteins?
TM sequence (transmembrane), ligands, cell surface, coiled-coils, etc.
Expression patterns scattered across worm. Specific phenotypes elicit specific unknowns.
Behavioral and other assays to discover functions of some unknowns in C. elegans by knocking out unknown genes. Striking phenotypes vs. hours of exposure. Could knock in human sequence to see if phenotype is rescued?
Discussion about using conserved unknown methods, “annotation density” metric when annotating new nonmodel organism genome.