Read From Microsoft Excel (.xlsx) file and Insert into a DB using Python?

Question

I'm working on an application to read an .xlsx, (Excel File) with headers and insert the rows, as-is, into a DB Table in a Postgres DB.

I presume i would be using the XLRD library.

Any ideas on How to do this?

I'm using Python 2.6.6.

Instead of asking how to do it, you should rather try it and ask for specific help on a topic that doesn't work for you. — lorenz, Dec 31 '13 at 07:52

Craig Ringer · Answer 1 · 2013-12-31T10:07:28.267

By far the easiest way will be to transform it to CSV, then use psql's \copy, or psycopg2's COPY support. Any command-line .xlsx to CSV converter would then work for you when invoked via the built-in subprocess module, or popen/system on older Pythons. You wouldn't need any special Python libs. This would also let you use a converter, like OpenOffice / LibreOffice's command-line mode, that can handle all sorts of other tabular data too.

If you want to do it all in Python you could use the xlrd lib you mentioned and feed the tuples one by one into psycopg2's copy_from, so you stream the rows to PostgreSQL using only Python code. You should find existing examples of psycopg2 based programs that use copy_from easy to adapt to using xlrd as a data source instead of Python lists / csv input / whatever.

score 0 · Answer 2 · answered Dec 31 '13 at 08:54

0

If you have Excel, format the XLS file so that there are no headers and no blank columns (basically, just make a table). This can also work with LibreOffice. Then, save the file as a CSV. Then, view your CSV file with a text editor and verify that things are in order.

Once you are happy with the content of the CSV, it's quite simple to write a python code that reads each row and then creates a list of each field. Here is a 30 second version of code:

with open('example.csv','r') as csv_file:
    full = csv_file.read().split('\n')
    for line in full:
    values = line.split(',')
        print values

In the 'for' loop, you can access each element of the list individually, i.e. 'values[0]'

answered Dec 31 '13 at 08:54

philshem

24,761
8
61
127

Please *never* advise people to manually parse CSV! Python [contains the `csv` module as a built-in module](http://docs.python.org/2/library/csv.html) for a reason. It's easier *and* safer than hand-hacking CSV-like text. Consider how your 30-second example handles `"24 SomeStreet, Sometown", "Somecountry"`. It doesn't - kaboom. [You can wrangle HTML using regular expressions, but it's a terrible idea](http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html). This is the same kind of thing. – Craig Ringer Dec 31 '13 at 10:01
I was referring to the use of the XLRD Python Library to parse XLS/XLSX files directly as a data stream...! - already know about parsing CSV's and thats much simpler too... Thanks anyway, but i got my problem solved by myself... – Arpan J Jan 22 '14 at 13:58

Read From Microsoft Excel (.xlsx) file and Insert into a DB using Python?

2 Answers2