1

Still new to Python and I ran into an issue earlier this month where String '0' was being passed into my Integer Column (using a SQLite db). More information from my original thread:

SQL: Can WHERE Statement filter out specific groups for GROUP BY Statement

This caused my SQL Query statements to return invalid data.

I'm having this same problem pop up in other columns in my database when the CSV file does not contain any value for the specific cell.

The source of my data is from an external csv file that I download (unicode format). I use the following code to insert my code into the DB:

with sqlite3.connect(db_filename) as conn:

    dbcursor = conn.cursor()

    with codecs.open(csv_filename, "r", "utf-8-sig") as f:
        csv_reader = csv.DictReader(f, delimiter=',')

        #  This is a much smaller column example as the actual data has many columns.
        csv_dict = [( i['col1'], i['col2'] ) for i in csv_reader)

    dbcursor.executemany(sql_str, csv_dict)

From what I researched, by design, SQLite does not enforce column type when inserting values. My solution to my original problem was to do a manual check to see if it was an empty value and then make it an int 0 using this code:

def Check_Session_ID( sessionID ):

    if sessionID == '':
        sessionID = int(0)

    return sessionID

Each integer / float column will need to be checked when I insert the values into the Database. Since there will be many rows on each import (100K +) x (50+ columns) I would imagine the imports to take quite a bit of time.

What are better ways to handle this problem instead of checking each value for each Int / Float column per row?

Thank you so much for the advice and guidance.

Community
  • 1
  • 1
ETam
  • 43
  • 5
  • It is a somewhat frustrating thing that the `csv` module represents `None` as the empty string. There is more discussion of this issue and a coded workaround in the answers to this question: http://stackoverflow.com/q/11379300/42346 – mechanical_meat Oct 31 '14 at 19:23
  • Thank you Bernie! so the better method would be to implement my own CSV DictReader subclass then which would give better performance? – ETam Oct 31 '14 at 20:04
  • Well, as martineau notes, you cannot subclass the readers/writers because they're implemented in C. If you use his solution, written in Python, it is quite likely that it will be slower than using the C-based csv reader and testing each value in each integer/float column. – mechanical_meat Oct 31 '14 at 20:11
  • By the way, why are you changing the value from `''`to `0` instead of to `None`? – mechanical_meat Oct 31 '14 at 20:11
  • Sorry the solution to set it as Int(0) might not be the most eloquent. But in my original thread, I was trying to filter all the rows where SessionID = 0. To account for data in csv that had '', I read it in and then assigned a 0 for those specific cases. If I didn't specifically do that, then my query string would return the '0' instead of 0. – ETam Oct 31 '14 at 20:49
  • Also, thank you very much for the information bernie! – ETam Oct 31 '14 at 20:57

0 Answers0