0

I am trying to insert excel data into a table (postgres) using a python script. I am running into an issue though where some of the larger numbers get inserted as exponentials. I realized this is happening when I am converting the file from .xls to .csv (I never open the .xls files because I realize that excel does some funky stuff where it'll save larger numbers into exponential form)

Is there an easy way to ensure the numbers don't get displayed as exponentials?

ie 812492400097 is being displayed as 8.12E+11

Here is the convert to .csv script:

import xlrd    
import unicodecsv    
import sys    
import os    
import datetime


def csv_from_excel(xlsfile, csvfile):  
    wb = xlrd.open_workbook(xlsfile)  
    sh = wb.sheet_by_index(0)  
    outputfile = open(csvfile, 'wb')  
    wr = unicodecsv.writer(outputfile, quoting=unicodecsv.QUOTE_ALL)

    for rownum in xrange(sh.nrows):
        wr.writerow(sh.row_values(rownum))

    outputfile.close()

def log(s):
    print str(datetime.datetime.now()) + ": " + s


#main
if len(sys.argv) < 2:   
    print "Missing parameters: input xls file"   
    sys.exit()

sourcefile = sys.argv[1]

destfile = sourcefile.split('.')[0] + '.csv'

log("processing " + sourcefile)

csv_from_excel(sourcefile, destfile)

Also wondering if perhaps instead of ensuring the .csv doesn't turn numbers to exponentials, to turn exponentials to numbers when inserting into the postgres table?

Marisa
  • 59
  • 9
  • 1
    Please post your Python script - it is really hard to find a problem in a Python script without actually knowing the script. – phihag Jul 22 '14 at 21:30
  • I know that Excel may display numbers within a CSV file in scientific notation if they are large enough, but Excel doesn't actually modify the CSV file (it only interprets the large values and displays them in scientific notation). – Nate Jenson Jul 22 '14 at 21:41
  • The most obvious reason for this problem would be that you're converting the numbers to strings in some way that doesn't specify any non-default formatting, and the solution is to specify the formatting. For example, maybe you want `{:f}` instead of just `{}` in your `format` string, or `%f` instead of `%s` in your `%` string, or to use some form of formatting instead of just calling `str()`. – abarnert Jul 22 '14 at 21:48
  • Also, is your problem that your Python script is writing `8.12E+11` to the .csv file, or is it that you're writing `812492400097` but when Excel imports that .csv file it ends up as `8.12E+11`, or… something different? – abarnert Jul 22 '14 at 21:50
  • The python script is writing 8.12E+11 to the .csv file. – Marisa Jul 23 '14 at 13:47
  • Please see the script. Thanks. @njenson I tried just opening the .csv as a text file and it still shows up as exponential. Also when I import into the postgres table it does this too. – Marisa Jul 23 '14 at 13:53

1 Answers1

0

The xlrd module treats all numbers from Excel as floats, because Excel calculates all numbers as floats:

https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966#sheet.Cell-class

  • Type symbol [Type number] Python value
  • XL_CELL_EMPTY [0] empty string u''
  • XL_CELL_TEXT [1] a Unicode string
  • XL_CELL_NUMBER [2] float
  • XL_CELL_DATE [3] float
  • XL_CELL_BOOLEAN [4] int; 1 means TRUE, 0 means FALSE
  • XL_CELL_ERROR [5] int representing internal Excel codes; for a text representation, refer to the supplied dictionary error_text_from_code
  • XL_CELL_BLANK [6] empty string u''. Note: this type will appear only when open_workbook(..., formatting_info=True) is used.

Your solution, presumably, needs to define the formatting used by unicodecsv to write floats.

A previous question (How can I prevent csv.DictWriter() or writerow() rounding my floats?) indicates that the csv module used to use float.__str__ rather than float.__repr__, which caused rounding. unicodecsv might still use float.__str__.

Community
  • 1
  • 1
Dane
  • 1,201
  • 8
  • 17