2

I would like to combine columns of various csv files into one csv file, with a new heading, concatenated horizontally. I want to only select certain columns,chosen by heading. There are different columns in each of the files to be combined.

Example input:

freestream.csv:

static pressure,static temperature,relative Mach number
1.01e5,288,5.00e-02

fan.csv:

static pressure,static temperature,mass flow
0.9e5,301,72.9

exhaust.csv:

static pressure,static temperature,mass flow
1.7e5,432,73.1

Desired output:

combined.csv:

P_amb,M0,Ps_fan,W_fan,W_exh
1.01e5,5.00e-02,0.9e6,72.9,73.1

Possible call to the function:

reorder_multiple_CSVs(["freestream.csv","fan.csv","exhaust.csv"],
    "combined.csv",["static pressure,relative Mach number",
    "static pressure,mass flow","mass flow"],
    "P_amb,M0,Ps_fan,W_fan,W_exh")

Here is a previous version of the code, with only one input file allowed. I wrote this with help from write CSV columns out in a different order in Python:

def reorder_CSV(infilename,outfilename,oldheadings,newheadings):
    with open(infilename) as infile:
       with open(outfilename,'w') as outfile:
           reader = csv.reader(infile)
           writer = csv.writer(outfile)
           readnames = reader.next()
           name2index = dict((name,index) for index, name in enumerate(readnames))
           writeindices = [name2index[name] for name in oldheadings.split(",")]
           reorderfunc = operator.itemgetter(*writeindices)
           writer.writerow(newheadings.split(","))
           for row in reader:
               towrite = reorderfunc(row)
               if isinstance(towrite,str):
                   writer.writerow([towrite])
               else:
                   writer.writerow(towrite)

So what I have figure out, in order to adapt this to multiple files, is:

-I need infilename, oldheadings, and newheadings to be a list now (all of the same length)

-I need to iterate over the list of input files to make a list of readers

-readnames can also be a list, iterating over the readers

-which means I can make name2index a list of dictionaries

One thing I don't know how to do, is use the keyword with, nested n-levels deep, when n is known only at run time. I read this: How can I open multiple files using "with open" in Python? but that seems to only work when you know how many files you need to open.

Or is there a better way to do this?

I am quite new to python so I appreciate any tips you can give me.

Community
  • 1
  • 1
moink
  • 798
  • 8
  • 20

2 Answers2

2

I am only replying to the part about opening multiple files with with, where the number of files is unknown before. It shouldn't be too hard to write your own contextmanager, something like this (completely untested):

from contextlib import contextmanager

@contextmanager
def open_many_files(filenames):
    files = [open(filename) for filename in filenames]
    try:
        yield files
    finally:
        for f in files:
            f.close()

Which you would use like this:

innames = ['file1.csv', 'file2.csv', 'file3.csv']
outname = 'out.csv'
with open_many(innames) as infiles, open(outname, 'w') as outfile:
    for infile in infiles:
        do_stuff(in_file)

There is also a function that does something similar, but it is deprecated.

Bas Swinckels
  • 18,095
  • 3
  • 45
  • 62
  • Thank you. I will test it tomorrow and try to implement the rest. – moink Jul 01 '14 at 20:21
  • It took me a while (got sidetracked by other projects) but I fixed & implemented your solution and it worked. Thank you very much. – moink Nov 12 '14 at 13:41
0

I am not sure if this is the correct way to do this, but I wanted to expand on Bas Swinckels answer. He had a couple small inconsistencies in his very helpful answer and I wanted to give the correect code.

Here is what I did, and it worked.

from contextlib import contextmanager
import csv
import operator
import itertools as IT

@contextmanager
def open_many_files(filenames):
    files=[open(filename,'r') for filename in filenames]
    try:
        yield files
    finally:
        for f in files:
            f.close()

def reorder_multiple_CSV(infilenames,outfilename,oldheadings,newheadings):
    with open_many_files(filter(None,infilenames.split(','))) as handles:
        with open(outfilename,'w') as outfile:
            readers=[csv.reader(f) for f in handles]
            writer = csv.writer(outfile)
            reorderfunc=[]
            for i, reader in enumerate(readers):
                readnames = reader.next()
                name2index = dict((name,index) for index, name in enumerate(readnames))
                writeindices = [name2index[name] for name in filter(None,oldheadings[i].split(","))]
                reorderfunc.append(operator.itemgetter(*writeindices))
            writer.writerow(filter(None,newheadings.split(",")))
            for rows in IT.izip_longest(*readers,fillvalue=['']*2):
                towrite=[]
                for i, row in enumerate(rows):
                   towrite.extend(reorderfunc[i](row))
                if isinstance(towrite,str):
                   writer.writerow([towrite])
                else:
                   writer.writerow(towrite)  
moink
  • 798
  • 8
  • 20