0

I've taken a PDF and converted it to text which i'm trying to break up into different sections by "FIGURE". When I run my code on a subset of the whole text file it runs, but when I try to do the whole text file it doesn't run. Any Ideas? this is the error I get and my code.

UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 851: ordinal not in range(128)

import re
import pandas as pd
from pandas import ExcelWriter

with open(r'\Desktop\Python\Python 2.7\InFile\dataIn.txt', 
'r') as myFile:
    data = myFile.read().replace('\n', '').decode('utf-8')
    file = re.split('FIGURE',data)


df = pd.DataFrame(file, columns=None)

writer = ExcelWriter('PythonExport.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()

0 Answers0