1

In my application, I've list of (Urdu) words in text file, (currently single word like this)

enter image description here

and I've another text file having string of urdu (currently single word like this and exactly same)

enter image description here

Now I need to find if string file's string has any word that exists in word's file. For this, I'm reading both file into lists like this;

// reading text file of strings...

fileToRead = codecs.open('string.txt', mode, encoding=encoding)
fileData = fileToRead.read()
lstFileData = fileData.split('\n')


wordListToRead = codecs.open('words.txt', mode, encoding=encoding)
wordData = wordListToRead.read()
lstWords = wordData.split('\n')

I'm simply traversing list like this;

for string in lstFileData:
    if string in lstWords:
        // do further work

and its not working And I don't know Why? Although string is 'فلسفے' and lstWords has this string in it. Do I need to add some encoding? Any kind of help will be appreciated.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Naila Akbar
  • 3,033
  • 4
  • 34
  • 76

2 Answers2

1

Just tried it out in python3 and it seems to work for me:

lstWords = ['a', 'فلسفے', 'b']
string = 'فلسفے'
if string in lstWords:
    print("yes")

Edit: Again, just tested your updated code with file IO and it works fine (I did not specify an encoding). Here is a link of it working: https://trinket.io/python3/3890d8b261

golddove
  • 1,165
  • 2
  • 14
  • 32
  • Yes, it is exactly the same thing and It is assumed to work but it is not. – Naila Akbar Oct 06 '18 at 14:49
  • 1
    I think there is something else going on. Look at the link in my updated answer to see the code working just fine in python3. – golddove Oct 07 '18 at 03:35
  • Yes.. Issue was in file. I opened it in notepad and updated..and this thing changed it from utf-8 to utf-8 BOM. I guessed that was making issue. Once I made new file in notepad++ and saved it as utf-8. Same code started working fine. – Naila Akbar Oct 07 '18 at 12:49
0

May be it helped out someone like me

Although it sounds like fun but Issue was in file encoding type. I opened up file in simple notepad to make some changes and saved it. It changed my file from utf-8 to utf-8 BOM. And my code wasn't working on it. Once I created new file in notepad++ in utf-8, Same code started working fine. (Because issue was not in code, it was in file encoding)

Naila Akbar
  • 3,033
  • 4
  • 34
  • 76
  • It might well help future readers, but it's quite unlikely to be found with the current, very specific title; consider to change it to something more general like "strings looking the same compare unequal" or similar. Btw: the correct encoding to open files with UTF8 BOM is called "utf-8-sig" in Python. Otherwise (if you decode with "utf-8") the BOM character will stick to the beginning of the content. – lenz Oct 07 '18 at 13:12