I'm using Python 2.6 (32bits) on Windows 8. The purpose is building a pysqlite database from a comma separated CSV file where the first row is column name.
My code as following (some code skipped):
def BuildDatabaseFromCSV(self, file_name):
with file(file_name, 'rb') as source:
csvreader = csv.reader(source)
first_entry = csvreader.next()
self.BuildTable(first_entry)
#self.AddEntries(csvreader)
for entry in csvreader:
self.AddEntry(entry)
def BuildTable(self, cols_names):
cmd_line = str(cols_names)[1:-1] #Remove '[]' from list type
cmd_line = "create table %s (%s)" % (self._t_name, cmd_line)
self._cursor.execute(cmd_line) #Creating table
self._db.commit()
self._col_num = len(cols_names)
def AddEntry(self, entry):
length = len(entry)
if self._col_num > length: #Padding empty fields
entry += ([''] * (self._col_num - length))
elif self._col_num < length: #Crop extra fields
entry = entry[:self._col_num]
cmd_line = "insert into %s values (%s)"\
% (self._t_name, ("?," * self._col_num)[:-1])
self._cursor.execute(cmd_line, entry)
self._db.commit()
def AddEntries(self, entries):
cmd_line = "insert into %s values (%s)"\
% (self._t_name, ("?," * self._col_num)[:-1])
self._cursor.executemany(cmd_line, entries)
self._db.commit()
Originally I used AddEntries() to add entries from CSV file to my database. However, for CSV file generated by Excel 2003, the amount of "field" for each line might be different. Some lines may contains more or less field than the "column name" line (extra or missing commas) if they contain tailing empty columns or some extra garbage.
I cannot use "field_size_limit" since I need to read the CSV first to define it.
Is there a simpler way to achieve any of following purpose?
1. When reading CSV file, specify a fixed amount of columns so it inserts or deletes commas automatically?
2. When inserting a entry to mysqlite database, specify a fixed amount of columns so it can accept variable amount of input columns?
Thanks in advance.