Longest Common Substring Problem

To find a shared motif between a collection of nucleic acid sequences, skills for solving the longest common substring problem are needed.

Suffix Trees, Dynamic Programming:

http://www.cs.cmu.edu/~ckingsf/bioinfo-lectures/suffixtrees.pdf

http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring#Python

function LCSubstr(S[1..m], T[1..n])
    L := array(1..m, 1..n)
    z := 0
    ret := {}
    for i := 1..m
        for j := 1..n
            if S[i] == T[j]
                if i == 1 or j == 1
                    L[i,j] := 1
                else
                    L[i,j] := L[i-1,j-1] + 1
                if L[i,j] > z
                    z := L[i,j]
                    ret := {S[i-z+1..i]}
                elif L[i,j] == z
                    ret := ret ∪ {S[i-z+1..i]}
            else L[i,j]=0;
    return ret
Advertisements

About Lisa Cohen

PhD student at UC Davis.
This entry was posted in Bioinformatics, Python. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s