0

I've successfully combined all csv files in a directory, however struggling with the ability to skip the first row (header) of each file. The error I currently get is " 'list' object is not an iterator". I have tried multiple approaches including not using the [open(thefile).read()], but still not able to get it working. Here is my code:

 import glob
 files = glob.glob( '*.csv' )
 output="combined.csv"

 with open(output, 'w' ) as result:
     for thefile in files:
         f = [open(thefile).read()]
         next(f)   ## this line is causing the error 'list' object is not an iterator

         for line in f:
             result.write( line )
 message = 'file created'
 print (message)  
jKraut
  • 2,325
  • 6
  • 35
  • 48
  • You should close each file after reading it, either explicitly, or using 'with' as you did opening the file to which you are writing. – Fred Mitchell Mar 13 '15 at 02:11
  • You might find [this answer](http://stackoverflow.com/questions/11349333/when-processing-csv-data-how-do-i-ignore-the-first-line-of-data/11350095#11350095) helpful. – martineau Mar 13 '15 at 02:16

3 Answers3

1

Use readlines() function instead of read(), so that you could easily skip the first line.

f = open(thefile)
m = f.readlines()
for line in m[1:]:
    result.write(line.rstrip())
f.close()

OR

with open(thefile) as f:
    m = f.readlines()
    for line in m[1:]:
        result.write(line.rstrip())

You don't need to explicitly close the file object if the file was opened through with statement.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
1

Here's an alternative using the oft forgotten fileinput.input() method:

import fileinput
from glob import glob

FILE_PATTERN = '*.csv'
output = 'combined.csv'

with open(output, 'w') as output:
    for line in fileinput.input(glob(FILE_PATTERN)):
        if not fileinput.isfirstline():
            output.write(line)

It's quite a bit cleaner than many other solutions.

Note that the code in your question was not far off working. You just need to change

f = [open(thefile).read()]

to

f = open(thefile)

but I suggest that using with would be better still because it will automatically close the input files:

with open(output, 'w' ) as result:
    for thefile in files:
        with open(thefile) as f:
            next(f)
            for line in f:
                result.write( line )
mhawke
  • 84,695
  • 9
  • 117
  • 138
0
>>> a = [1, 2, 3]
>>> next(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list object is not an iterator

I am not sure why you chose to bracket the read, but you should recognize what is happening from the example above.

There is already a good answer. This is just an example of how you might look at the problem. Also, I would recommend getting what you want to work with just a single file. After that is working, import glob and work on using your mini-solution in the bigger problem.

Fred Mitchell
  • 2,145
  • 2
  • 21
  • 29