3

This is the code:

xls = open_workbook('data.xls')

In return:

File "/home/woles/P2/fin/fin/apps/data_container/importer.py", line 16, in import_data
  xls = open_workbook('data.xlsx')
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/__init__.py", line 435,     in open_workbook
ragged_rows=ragged_rows,
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/book.py", line 91, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/book.py", line 1230, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/book.py", line 1224, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\r\n<html>'

The file is not damaged, I can open it with Excel, LibreOffice.

Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83
Michał
  • 286
  • 1
  • 3
  • 9
  • 1
    See [python xlrd unsupported format, or corrupt file.](http://stackoverflow.com/a/15437095/222914) – Janne Karila Jun 02 '14 at 12:30
  • The file isn't necessarily *damaged* but it's not *supported*. Excel can open lots of things that are not supported by `xlrd`, such as CSV, SYLK, DIF, and many others. The fact that the error message tells you what it *did* find is a clue to what kind of file it **really** is. It's definitely not a real XLSX file, just named like one. – John Y May 06 '16 at 18:24

3 Answers3

1

Try to open it with pandas:

import pandas as pd
data=pd.read_html(filename.xls)

Or try any other html python parser.

That's not a proper excel file, but an html readable with excel.

foebu
  • 1,365
  • 2
  • 18
  • 35
0

I had same error resolved just now, first step id check changing the file to text and notice html content , than did some modification to ensure that once I save it as HTML and open in browser than table should be visible. Than Beautiful Soup and pandas helped me to get excel output....

check below lines if may help..

import pandas as pd
import os
import shutil
import html5lib
import requests
from bs4 import BeautifulSoup
import re
import time

shutil.copy('donloaded.xls','changed.html')
shutil.copy('changed.html','txt_output.txt')
time.sleep(2)

txt = open('txt_output.txt','r').read()

# Modify the text to ensure the data display in html page

txt = str(txt).replace('<style> .text { mso-number-format:\@; } </script>','')

# Add head and body if it is not there in HTML text

txt_with_head = '<html><head></head><body>'+txt+'</body></html>'

# Save the file as HTML

html_file = open('output.html','w')
html_file.write(txt_with_head)

# Use beautiful soup to read

url = r"C:\Users\hitesh kumar\PycharmProjects\OEM ML\output.html"
page = open(url)
soup = BeautifulSoup(page.read(), features="lxml")
my_table = soup.find("table",attrs={'border': '1'})

frame = pd.read_html(str(my_table))[0]
print(frame.head())
frame.to_excel('testoutput.xlsx',sheet_name='sheet1', index=False)
Hietsh Kumar
  • 1,197
  • 9
  • 17
0

For .xls files you can use read_excel():

import pandas as pd
df1= pd.read_excel("filename.xls")

The parameters header and sep may help you get rid of some errors (here you can find more info on the parameters). An example of its usage

df2= pd.read_excel("filename.xls",  header = None, sep='delimiter')

Note that if the files are a .csv you will get the error

XLRDError: Unsupported format, or corrupt file: Expected BOF record;

To read .csv one needs to use read_csv(), like this

df3= pd.read_csv("filename.csv")
Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83