Finding coding sequences in DNA

From the UCSD “Bioinformatics Algorithms” class, programming assignments week 2.

Given a DNA sequence and an amino acid peptide sequence:


I want to find all occurrences (including reverse compliment) of the coding sequence in the DNA for that amino acid peptide sequence. Example output:


Because the genetic code is degenerate, there may be multiple possible coding sequences for the amino acid peptide in a strand (sequence) of DNA.

Python coding steps:

1. From a Python dictionary “aminos” with aa as keys and RNA codons as values, make a new dictionary with aa position (depending on length of aa sequence given) as keys and a list of possible RNA codons as values:

def aminoacid_codons(aapeptide):
    for aa in aapeptide:
    return codon_dict


Output from the above function:

{1: ['AUG'], 2: ['GCU', 'GCC', 'GCA', 'GCG']}

2. Construct different sequences with 2 codons each from all possibilities. The reverse compliment should also be considered.

How do I randomly select an item from a list in Python? See Python, pseudorandom module.

import random

How many dictionary keys are there?

def aa_DNA(codon_dict):
   while position<=codon_positions:

3. Randomly select an item from “listofcodons” so that when it is selected it should be removed from the list and can’t be selected again:

This is where I'm stuck.

4. Return a list of all RNA combinations.
5. Convert to DNA


6. Deal with reverse compliment.

7. Input DNA sequence, reverse compliment

8. Compare input with possibilities


About Lisa Johnson

PhD candidate at UC Davis in Molecular, Cellular, and Integrative Physiology
