5

I'm trying to merge multiple csv files with the same format into one.

merge_list = glob.glob(gndlbsum+"*gndlbsum.csv")
filewriter_lbsum = target_dir+"gndlbsum_master.csv"

#get the list of csv files and set the output file
counter=0
for file in merge_list:
    with open(file,"rU") as csv_file:
        filereader = csv.reader(csv_file)
        with open(filewriter_lbsum,"a") as f:
            writer = csv.writer(f, delimiter = "|")
            #check to see if it's the first file, if it is, add header,
            #otherwise skip first row
            if counter<1:
                for row in filereader:
                    writer.writerow(row)
                    counter+=1
            else:
                header = next(filereader,None)
                for row in filereader:
                    writer.writerow(row)

When I do it like this, each row in the output csv is entirely enclosed with double quotes, I tried to uselist.append(row)instead, but it makes no difference, since the row is enclosed with double quotes. Is there a way to avoid this?

EDIT:

Here is a sample of the source file:

COL1|COL2|COL3
1|2|3
4|5|6

And the output:

"COL1|COL2|COL3"
"1|2|3"
"4|5|6"
martineau
  • 119,623
  • 25
  • 170
  • 301
quickshare
  • 155
  • 4
  • 13

2 Answers2

6

I think this will get rid of the quotes which are being caused by not telling the csv.readers being created that the delimiters in the input file are "|" characters rather than default which is "," characters.

merge_list = glob.glob(gndlbsum + "*gndlbsum.csv")
file_writer_lbsum = os.path.join(target_dir, "gndlbsum_master.csv")

# Append each csv file in the list to the output file
first_file = True
for file in merge_list:
    with open(file, "rU") as csv_file:
        reader = csv.reader(csv_file, delimiter="|")
        with open(file_writer_lbsum, "w" if first_file else "a") as f:
            writer = csv.writer(f, delimiter="|")
            # Unless it's the first file, skip its header row
            if not first_file:
                next(reader)
                first_file = False
            writer.writerows(reader)
martineau
  • 119,623
  • 25
  • 170
  • 301
  • Thank you, that's exactly it. I can't believe it was that simple :) – quickshare Dec 18 '14 at 17:43
  • Yes, since there weren't any default comma delimiters in the rows of the files, it was treating the whole thing as a single string, and the default for a csv writer is for it to quote strings. – martineau Dec 19 '14 at 10:01
1

The csv module adds the double quotes automatically at the end of each line. Without knowing exactly what your csv file looks like, try the following:

merge_list = glob.glob(gndlbsum+"*gndlbsum.csv")
filewriter_lbsum = target_dir+"gndlbsum_master.csv"
#get the list of csv files and set the output file
counter=0
for file in merge_list:
    with open(file,"rU") as csv_file:
        filereader = csv.reader(csv_file, skipinitialspace=True)
        with open(filewriter_lbsum,"a") as f:
            writer = csv.writer(f,delimiter = "|", quoting=csv.QUOTE_NONE)
            #check to see if it's the first file, if it is, add header, 
            #otherwise skip first row
            if counter<1:
                for row in filereader:
                    writer.writerow(row)
                    counter+=1
            else:
                header = next(filereader,None)
                for row in filereader:
                    writer.writerow(row)
ekrah
  • 584
  • 1
  • 5
  • 17
  • All of the source csv files have headers, they are | delimited, and there are no string values, only numbers, and even if they had, none of them are enclosed with quotes in the source. Here's an example: Source: COL1|COL2|COL3 1|2|3 4|5|6 output: "COL1|COL2|COL3" "1|2|3" "4|5|6" I tried your example but it returned an empty file – quickshare Dec 18 '14 at 17:01
  • The csv writer does not automatically add double quotes to the end of each line. – martineau Dec 18 '14 at 17:44