i have list of sublists each of consists of 1 or more strings. comparing each string in 1 sublist every other string in other sublists. consists of writing 2 loops. however, data set ~5000 sublists, means program keeps running forever unless run code in increments of 500 sublists. how change flow of program can still @ j values corresponding each i, , yet able run program ~5000 sublists. (wn wordnet library) here's part of code:
for in range(len(somelist)): if == len(somelist)-1: #if last sublist, not compare break title_former = somelist[i] word in title_former: singular = wn.morphy(word) #convert singular if singular == none: pass elif singular != none: newwordsyn = getnewwordsyn(word,singular) if not newwordsyn: uncounted_words.append(word) else: j in range(i+1,len(somelist)): title_latter = somelist[j] word1 in title_latter: singular1 = wn.morphy(word1) if singular1 == none: uncounted_words.append(word1) elif singular1 != none: newwordsyn1 = getnewwordsyn(word1,singular1) tempsimilarity = newwordsyn.wup_similarity(newwordsyn1)
example:
input = [['space', 'invaders'], ['draw']] output= {('space','draw'):0.5,('invaders','draw'):0.2}
the output dictionary corresponding string pair tuple , similarity value. above code snippet not complete.
you try doubt faster (and need change distance function)
def dist(s1,s2): return sum([i!=j i,j in zip(s1,s2)]) + abs(len(s1)-len(s2)) dict([((k,v),dist(k,v)) k,v in itertools.product(input1,input2)])