I have crawled pdf,html,doc files using Apache Tika and stored structured text into text files.These text files contain some unusual special characters,because of these special characters i am unable to read those text files.I have below code snippet to read the files
fo = codecs.open('/var/www/testfiles/sample.txt','r','utf-8').read()
But,I am getting following error
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb7 in position 1291: invalid start byte
Please,suggest me how to read my text files. Thanks