here is an array of Unicode words used in the python script.
texts =[u"abc", u"pqr", u"mnp"]
The script is working as expected with the above 3 words example. The issue is that there are thousands of words in a text file. How do I read from the text file?
Update: I have 2 issues. The sequence of words from the text file is not maintained in the output. The text file has unicode characters and hence the "u" in my original example.
# cat testfile.txt
Testing this file with Python
# cat test.py
#!/usr/bin/python
# -*- coding: utf-8 -*-
f = open('testfile.txt', 'r')
texts = set(f.read().split())
print (texts)
# python test.py
set(['this', 'Python', 'Testing', 'with', 'file'])