I have this main text How can I run java script from a local folder?
this diff.diff_main(diff(), "How can I run java script from a local folder?","How can I run Javascript from a local folder?")
returns [(0, 'How can I run '), (-1, 'j'), (1, 'J'), (0, 'ava'), (-1, ' '), (0, 'script from a local folder?')]
it's not very big problem with this short string but it is with bigger strings like 40,000 chars which is common in my application. I choose this short string for clarity and readability,,, however i'm looking for a way to store text positions (from start position to end position) instead of actual text. it will finally matched with the original text.
example,,, instead of [(0, 'How can I run '), (-1, 'j'), (1, 'J'), (0, 'ava'), (-1, ' '), (0, 'script from a local folder?')]
I will have [(0, '0,14'), (-1, 'j'), (1, 'J'), (0, '15,18'), (-1, ' '), (0, '19,44')]
it will be decoded from positions encoded in tuples for example 0,14 is from position 0 to 14 or How can I run
,,, 15,18 from position 15 to 18 in original text or ava
and etc,,
it can be retrived like this originaltext[0:14]
later,,,
I have tried with this it gets very close
a=[(0, 'How can I run '), (-1, 'j'), (1, 'J'), (0, 'ava'), (-1, ' '), (0, 'script from a local folder?')]
b='How can I run java script from a local folder?'
result={}
positioncount = 0
for x, y in enumerate(a):
if y[0] == 0:
if positioncount == 0:
result[x]={y[0]:len(y[1])}
positioncount+=len(y[1])
else:
result[x]={y[0]:(len(y[1])+positioncount,len(y[1]))}
else:
result[x]={y[0]:y[1]}
positioncount-=len(y[1])
but print result
is give me {0: {0: 14}, 1: {-1: 'j'}, 2: {1: 'J'}, 3: {0: (15, 3)}, 4: {-1: ' '}, 5: {0: (38, 27)}}
and is not correct because it should give {0: {0: 14}, 1: {-1: 'j'}, 2: {1: 'J'}, 3: {0: (15, 18)}, 4: {-1: ' '}, 5: {0: (19, 44)}}
what im doing wrong here? is there anyway to do this right? if you have any alternative im glad to take it in thanks!