UTF-8 Encode error but it's already encoded

Asked Jun 02 '17 at 19:54

Active Jun 02 '17 at 19:54

Viewed 87 times

I've taken a PDF and converted it to text which i'm trying to break up into different sections by "FIGURE". When I run my code on a subset of the whole text file it runs, but when I try to do the whole text file it doesn't run. Any Ideas? this is the error I get and my code.

UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 851: ordinal not in range(128)

import re
import pandas as pd
from pandas import ExcelWriter

with open(r'\Desktop\Python\Python 2.7\InFile\dataIn.txt', 
'r') as myFile:
    data = myFile.read().replace('\n', '').decode('utf-8')
    file = re.split('FIGURE',data)


df = pd.DataFrame(file, columns=None)

writer = ExcelWriter('PythonExport.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()

asked Jun 02 '17 at 19:54

user3473269

it probably got corrupted bro – Dr Upvote Jun 02 '17 at 19:56
1

Have you tried defining different `encoding` in the `open` function? – Jarad Jun 02 '17 at 23:01

UTF-8 Encode error but it's already encoded

0 Answers0