I am trying to read a sharepoint excel file into a dataframe.
However I am getting error XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n<!DOCT'
The question is similar to this: using Pandas to read in excel file from URL - XLRDError however my error is different based on this part b'\r\n<!DOCT'
I am not sure what this error even is - any help would be appreciated.
Code
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
import pandas as pd
import io
url = 'url'
username = 'user'
password = 'pass'
ctx_auth = AuthenticationContext(url)
if ctx_auth.acquire_token_for_user(username, password):
ctx = ClientContext(url, ctx_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()
print("Authentication successful")
response = File.open_binary(ctx, url)
#save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) #set file object to start
#read excel file and each sheet into pandas dataframe
df = pd.read_excel(bytes_file_obj)
print(df)
What i've tried
df = pd.read_csv(bytes_file_obj, error_bad_lines=False)
However it just gives my data into 1 column, looks like html/JSON format.
The reason I put error_bad_lines
in is because read_csv
was giving me ParserError: Error tokenizing data. C error: Expected 1 fields in line 6, saw 5