To find a shared motif between a collection of nucleic acid sequences, skills for solving the longest common substring problem are needed.

Suffix Trees, Dynamic Programming:

http://www.cs.cmu.edu/~ckingsf/bioinfo-lectures/suffixtrees.pdf

http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring#Python

functionLCSubstr(S[1..m], T[1..n]) L :=array(1..m, 1..n) z := 0 ret := {}fori := 1..mforj := 1..nifS[i] == T[j]ifi == 1 or j == 1 L[i,j] := 1elseL[i,j] := L[i-1,j-1] + 1ifL[i,j] > z z := L[i,j] ret := {S[i-z+1..i]}elifL[i,j] == z ret := ret ∪ {S[i-z+1..i]}elseL[i,j]=0;returnret

Advertisements