Home » Python » How can I solve backtrack (or some book said it's backtrace) function using python in NLP project?-Exceptionshub

# How can I solve backtrack (or some book said it's backtrace) function using python in NLP project?-Exceptionshub

Questions:

Here’s the code I got from github class and I wrote some function on it and stuck with it few days ago.

In this code I have to use maximum matching and then backtrace it.

``````thai_vocab = ["ไ","ป","ห","า","ม","เ","ห","ส","ี","ไป","หา","หาม","เห","สี","มเหสี","!"]

from math import inf #infinity
def maximal_matching(c):
#Initialize an empty 2D list
d  =[[None]*len(c) for _ in range(len(c))]

for i in range(len(d)):
for j in range(len(d)):

if(i == 0) and (c[i:j+1] in thai_vocab):
d[j] = 1

elif((j> 0) and (c[i:j+1] in thai_vocab)):
res = [k for k in zip(*d)][i-1]
temp = []

for val in res:
if val != None :
temp.append(val)

d[i][j] = 1 + min(temp)

elif((c[i:j+1]) != "") :
d[i][j] = inf

return d

def backtrack(d):
eow = len(d)-1 # End of Word position
word_pos = [] # Word position

row_pos = len(d)-1

while eow >=0:
res = [k for k in zip(*d)][eow]
temp = []
for val in res:
if val != None :
temp.append(val)

min_col = min(temp)

while row_pos >= 0:

if (d[row_pos][eow] == min_col) and (d[row_pos][eow-1] is None):
word_pos.append((row_pos,eow))

elif (d[row_pos][eow] == min_col) and (d[row_pos][eow-1] == inf) :
eow-=1
elif (d[row_pos][eow] == inf) and (d[row_pos-1][eow] == inf):
eow-=1
elif ((d[row_pos][eow] == min_col) and (d[row_pos][eow-1] == min_col) or (d[row_pos][eow-1] == inf)):
eow -=1
elif (d[row_pos][eow] == inf) and (d[row_pos][eow-1] is None) and (isinstance(d[row_pos-1][eow], int) == False):
word_pos.append((row_pos,eow))
else:
row_pos-=1

eow -=1

word_pos.reverse()
return word_pos
``````

Now I run the code below to get the result from maxmatch():

``````input_text = "ไปหามเหสี!"
out = maximal_matching(input_text)
for i in range(len(out)):
print(out[i],input_text[i])
``````

The result is

``````[1, 1, inf, inf, inf, inf, inf, inf, inf, inf] ไ
[None, 2, inf, inf, inf, inf, inf, inf, inf, inf] ป
[None, None, 2, 2, 2, inf, inf, inf, inf, inf] ห
[None, None, None, 3, inf, inf, inf, inf, inf, inf] า
[None, None, None, None, 3, inf, inf, inf, 3, inf] ม
[None, None, None, None, None, 3, 3, inf, inf, inf] เ
[None, None, None, None, None, None, 4, inf, inf, inf] ห
[None, None, None, None, None, None, None, 4, 4, inf] ส
[None, None, None, None, None, None, None, None, 5, inf] ี
[None, None, None, None, None, None, None, None, None, 4] !
``````

I the final step is trying to find the word which I tokenized by the algorithm (in this case it separate with 4 words via my dictionary).

``````def print_tokenized_text(d, input_text):
tokenized_text=[]
for pos in backtrack(d):
#print(pos)
tokenized_text.append(input_text[pos:pos+1])

print("|".join(tokenized_text))

print_tokenized_text(out,input_text)
``````

The result should be

``````ไป|หา|มเหสี|!
``````

but in this case i got the error ,it can’t solve my function with my code I don’t know how to optimize it.Could you suggest me which algorithm would be the best to search the value and print out the result ?