I am building a CSV chunk by chunk using the csv
module from the standard library.
This means that I am adding rows one by one in a loop. Each row that I add contains information for each column of my dataframe.
So, I have this CSV:
A B C D
And I am adding rows one by one:
A B C D
aaaaa bbb ccccc ddddd
a1a1a b1b1 c1c1c1 d1d1d1
a2a2a b2b2 c2c2c2 d2d2d2
And so on.
My problem is that sometimes, the row that I am adding contains MORE information (that is, information that does not have a column). For example:
A B C D
aaaaa bbb ccccc ddddd
a1a1a b1b1 c1c1c1 d1d1d1
a2a2a b2b2 c2c2c2 d2d2d2
a3a3a b3b3 c3c3c3 d3d3d3 e3e3e3 #this row has extra information
My question is: Is there any way to make the CSV grow (during runtime) when that happens? (with 'grow' I mean to add the "extra" columns)
So basically I want this to happen:
A B C D E # this column was added because
aaaaa bbb ccccc ddddd # of the extra column found
a1a1a b1b1 c1c1c1 d1d1d1 # in the new row
a2a2a b2b2 c2c2c2 d2d2d2
a3a3a b3b3 c3c3c3 d3d3d3 e3e3e3
I am adding the rows using the csv
module from the standard library, the with
statement and a dictionary:
import csv
addThis = {A:'a3a3a', B:'b3b3', C:'c3c3c3', D:'d3d3d3', E:'e3e3e3'}
with open('csvFile', 'a') as f:
writer = csv.writer(f)
writer.writerow(addThis)
As you can see, in the dictionary that I'm adding, I specify the name of the new column. What happens when I try that is that I get this exception:
ValueError: dict contains fields not in fieldnames: 'E'
I have tried adding the "extra" fieldname to the csv
before adding the row like this:
fields = writer.__getattribute__('fieldnames')
writer.fieldnames = fields + ['E']
Note: It seems from this example that I already now that E
will be added but that is not the case. I showed it like this just for the example. I don't know what the "extra" data will be until I get the "extra" rows (which I get over a period of time from a web scrape).
That manages to evade the exception, but does not add the extra column, so I end up with something like this:
A B C D
aaaaa bbb ccccc ddddd
a1a1a b1b1 c1c1c1 d1d1d1
a2a2a b2b2 c2c2c2 d2d2d2
a3a3a b3b3 c3c3c3 d3d3d3 e3e3e3 # value is added but the column
# name is not there
I am not using Pandas because I understand that Pandas is designed to load fully populated DataFrames, but I am open to using something besides the csv
module if you suggest it. Any ideas regarding that?
Thanks for your help and sorry for the long question, I tried to be as clear as possible.