i trying levenshtein edit distance algorithm working reason, number of edits coming out incorrect. can't see mistake , wondering if see's doing incorrectly.
input
5 atcgtt agttac acgaat ccgtaaat ttacgaccagt
expected output
strand a: atcgtt-- strand b: a--gttac edit distance: 4 strand a: atcg-tt strand b: a-cgaat edit distance: 3 strand a: atcgt---t strand b: -ccgtaaat edit distance: 5 strand a: at-cg----tt strand b: ttacgaccagt edit distance: 7 strand a: agttac strand b: acgaat edit distance: 4 strand a: -agt-tac strand b: ccgtaaat edit distance: 5 strand a: --a-g-tta-c strand b: ttacgaccagt edit distance: 8 strand a: acg--aat strand b: ccgtaaat edit distance: 3 strand a: --acga--a-t strand b: ttacgaccagt edit distance: 5 strand a: --ccg-taaat strand b: ttacgaccagt edit distance: 7
my output
strand a: atcgt- strand b: agttac edit distance: 5 strand a: atc-t- strand b: acgaat edit distance: 5 strand a: atc-t- strand b: ccgtaaat edit distance: 5 strand a: a-c-t- strand b: ttacgaccagt edit distance: 10 strand a: agttac strand b: acgaat edit distance: 5 strand a: ag-tac strand b: ccgtaaat edit distance: 6 strand a: a--t-c strand b: ttacgaccagt edit distance: 7 strand a: ac-aat strand b: ccgtaaat edit distance: 7 strand a: ac---t strand b: ttacgaccagt edit distance: 8 strand a: cc-taaat strand b: ttacgaccagt edit distance: 8
findeditdistance
void editdistance::findeditdistance() { int uppervalue, leftvalue, diagonalvalue; (int = 0; < mlengthx; ++i) { table[i][0].stringlength = i; } (int = 0; < mlengthy; ++i) { table[0][i].stringlength = i; } (int = 1; < mlengthx; ++i) { (int j = 1; j < mlengthy; ++j) { if (mstringx[i] == mstringy[j]) { table[i][j].direction = diagonal; table[i][j].stringlength = table[i - 1][j -1].stringlength; } else { uppervalue = table[i - 1][j].stringlength; leftvalue = table[i][j - 1].stringlength; diagonalvalue = table[i - 1][j - 1].stringlength; if (uppervalue < leftvalue) { if (uppervalue < diagonalvalue) { //upper lowest table[i][j].stringlength = table[i - 1][j].stringlength + 1; table[i][j].direction = up; } else { //diagonal lowest table[i][j].stringlength = table[i - 1][j -1].stringlength + 1; table[i][j].direction = diagonal; } } else if (leftvalue < diagonalvalue) { //left lowest table[i][j].stringlength = table[i][j - 1].stringlength + 1; table[i][j].direction = left; } else { //diagonal lowest table[i][j].stringlength = table[i - 1][j -1].stringlength + 1; table[i][j].direction = diagonal; } } } } }
getdistance
void editdistance::getdistance() { int = mstringx.length() - 1; int j = mstringy.length() - 1; numedits = 0; updatestrands (i, j); }
updatestrands
void editdistance::updatestrands (int i, int j) { if (i == 0 || j == 0) { return; } if (table[i][j].direction == diagonal) { ++numedits; updatestrands (i - 1, j - 1); } else if (table[i][j].direction == up) { mstringy[j] = '-'; ++numedits; updatestrands (i - 1, j); } else { mstringx[i] = '-'; ++numedits; updatestrands (i, j - 1); } }
the problem edit distance in updatestrands
. counts diagonal moves 1, when in fact diagonal move can have distance of 1 (substitution) or 0 (match). fix in updatestrands
, there's no need calculation there @ all, when number in lower-right corner of table
.
if want correct "strands" (e.g. "atcgtt--" , "a--gttac"), have make corrections in updatestrands
(you change elements of strings should insert), getdistance
(you start in wrong place) , findeditdistance
(you neglect assign values direction
along upper , left edges when set stringlength
i
).