2

I am trying to read a sharepoint excel file into a dataframe. However I am getting error XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n<!DOCT'

The question is similar to this: using Pandas to read in excel file from URL - XLRDError however my error is different based on this part b'\r\n<!DOCT'

I am not sure what this error even is - any help would be appreciated.

Code

from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
import pandas as pd
import io


url = 'url'
username = 'user'
password = 'pass'

ctx_auth = AuthenticationContext(url)
if ctx_auth.acquire_token_for_user(username, password):
 ctx = ClientContext(url, ctx_auth)
 web = ctx.web
 ctx.load(web)
 ctx.execute_query()
 print("Authentication successful")

response = File.open_binary(ctx, url)

#save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) #set file object to start

#read excel file and each sheet into pandas dataframe
df = pd.read_excel(bytes_file_obj)
print(df)

What i've tried

df = pd.read_csv(bytes_file_obj, error_bad_lines=False)

However it just gives my data into 1 column, looks like html/JSON format.

The reason I put error_bad_lines in is because read_csv was giving me ParserError: Error tokenizing data. C error: Expected 1 fields in line 6, saw 5

Jonnyboi
  • 505
  • 5
  • 19

0 Answers0