0

I have many CSV files, need to read all the files in loop and write file name and all the columns (header in row 1) in an output file.

Example

Input csv file 1 (test1.csv)

Id, Name, Age, Location
1, A, 25, India

Input csv file 2 (test2.csv)

Id, ProductName
1, ABC

Outputfile

test1.csv  Id
test1.csv  Name
test1.csv  Age
test1.csv  Location
test2.csv  Id
test2.csv  ProductName

Many thanks for your help.

Update: This code works fine for this purpose:

import os
import csv

ofile = open('D:\Anuj\Personal\OutputFile/AHS_File_Columns_Info.csv', 'w')

directory = os.path.join('D:\Anuj\Personal\Python')

for root, dirs, files in os.walk(directory):
    for file in files:
            fullfilepath = directory + "/" + file
            with open(fullfilepath,'r') as f:
                output = file +','+ f.readline()
                ofile.write(output)
Anuj Mishra
  • 35
  • 1
  • 6

3 Answers3

0

But I'm not sure I understand you correctly.

import csv
from typing import List
from typing import Tuple

TableType = List[List[str]]


def load_csv_table(file_name: str) -> Tuple[List[str], TableType]:
    with open(file_name) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        headers = next(csv_reader)
        data_table = list(csv_reader)
        return headers, data_table


def save_csv_table(file_name: str, headers: List[str], data_table: TableType):
    with open(file_name, 'w', newline='') as csv_file:
        writer = csv.writer(csv_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        writer.writerow(headers)
        for row in data_table:
            writer.writerow(row)


input_files = ['file1.csv', 'file2.csv', 'file3.csv']
new_table = []
new_headers = []
for file_name in input_files:
    headers, data_table = load_csv_table(file_name)
    if not new_headers:
        new_headers = ['Source'] + headers
    new_table.extend(([file_name] + line for line in data_table))
save_csv_table('output.csv', new_headers, new_table)
ADR
  • 1,255
  • 9
  • 20
  • thanks a lot. This code works fine for me: import os import csv ofile = open('D:\Anuj\Personal\OutputFile/AHS_File_Columns_Info.csv', 'w') directory = os.path.join('D:\Anuj\Personal\Python') for root, dirs, files in os.walk(directory): for file in files: fullfilepath = directory + "/" + file with open(fullfilepath,'r') as f: output = file +','+ f.readline() ofile.write(output) – Anuj Mishra Jan 03 '17 at 14:10
0

A simple method is to use readline() on the file object:

files=["test1.csv","test2.csv"]
for my_file in files:
    with open(my_file,'r') as f:
        print my_file, f.readline()
Rolf of Saxony
  • 21,661
  • 5
  • 39
  • 60
0

clean solution using csv module for reading and writing

  • open output file and create a csv.writer instance on its handle
  • open each input file and create a csv.reader instance on their handle
  • get first row using next on the csv.reader iterator: gets titles as list (with a small post-processing to remove the spaces)
  • write titles alongside the current filename in a loop

code:

import csv

files=["test1.csv","test2.csv"]
with open("output.tsv","w",newline='') as fw:
    cw = csv.writer(fw,delimiter="\t")  # output is tab delimited
    for filename in files:
        with open(filename,'r') as f:
            cr = csv.reader(f)
            # get title
            for column_name in (x.strip() for x in next(cr)):
                cw.writerow([filename,column_name])

There are several advantages using csv module, the most important being that quoting & multi-line fields/titles are managed properly.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219