Add new column in a csv file and manipulate on the on records

Question

I have 4 csv files named PV.csv, Dwel.csv, Sess.csv, and Elap.csv. I have 15 columns and arouind 2000 rows in each file. At first I would like to add a new column named Var in each file and fill up the cells of the new column with the same file name. Therefore, new column 'Var' in PV.csv file will filled up by PV. Same is for other 3 files. After that I would like to manipulate the all the files as follows.

Finally I would like to merge / join these 4 files based on the A_ID and B_ID and write records into a new csv file name finalFile.csv. Any suggestion and help is appreciated.

<p>PV.csv is as follows:</p>

   
A_ID      B_ID       LO       UP     LO      UP
103       321        0        402    
103       503        192      225    433     608   
106       264        104      258    334     408
107       197        6        32     113     258

Dwell.csv is as follows:

   
A_ID      B_ID       LO       UP     LO      UP  
103       321        40       250    517     780
103       503        80       125    435     585     
106       264        192      525    682  
107       197        324      492    542     614

Session.csv is as follows:

   
A_ID      B_ID       LO       UP     LO      UP 
103       321        75       350    370     850     
106       264        92       225    482     608  
107       197        24       92     142

Elapsed.csv is as follows:

   
A_ID      B_ID       LO       UP     LO      UP 
103       321        5        35     75
103       503        100      225    333     408      
106       264        102      325    582  
107       197        24       92     142     214

First output file of PV.csv will be as follows:

Same way all rest of three files will be filled up with new column with ehrer file name, Dwell, Session, and Elapsed:

   
A_ID    B_ID      Var   LO        UP     LO      UP
103     321       PV    0         402    
103     503       PV    192       225    433     608   
106     264       PV    104       258    334     408
107     197       PV    6         32     113     258

Final output file will be as follows:

finalFile.csv.

   
A_ID    B_ID      Var    LO        UP
103     321       PV     0         402
103     321       Dwel   40        250
103     321       Dwel   251       517
103     321       Dwel   518       780
103     321       Sess   75        350
103     321       Sess   351       370
103     321       Sess   371       850
103     321       Elap   5         35
103     321       Elap   36        75
103     503       PV     192       225
103     503       PV     226       433
103     503       PV     434       608
103     503       Dwel   80        125
103     503       Dwel   126       435
103     503       Dwel   436       585
103     503       Elap   100       225
103     503       Elap   226       333
103     503       Elap   334       408
106     264       PV     104       258
106     264       PV     259       334
106     264       PV     335       408
106     264       Dwel   192       525
106     264       Dwel   526       682
106     264       Sess   92        225
106     264       Sess   226       482
106     264       Sess   483       608
106     264       Elap   102       325
106     264       Elap   326       582
107     197       PV     6         32
107     192       PV     33        113
107     192       PV     114       258
107     192       Dwel   324       492
107     192       Dwel   493       542
107     192       Dwel   543       614
107     192       Sess   24        92
107     192       Sess   93        142
107     192       Elap   24        92
107     192       Elap   93        142
107     192       Elap   143       214

In text you say you have 15 colums, the input files have 6 and the result 4. I'm sorry but I cannot understand what you really want to do, and why reading with a DictReader, adding a key value per row and writing back would not be enough. — Serge Ballesta, Dec 12 '15 at 16:11
Thanks. I cannot include all the column here. The input file have more column same as LO, UP, LO, UP..... The result file must have only 4 columns as above. — Tofazzal, Dec 12 '15 at 17:13
You should show at least one input line to let us understand how you it to be splitted among many lines, or if only the to first LO UP fields should be kept. — Serge Ballesta, Dec 12 '15 at 17:25
Where is your effort? Read the docs of `csv` module and try to implement it. Ask, if you get stuck, and Google is not helping. — GingerPlusPlus, Dec 12 '15 at 17:48
@ Serge Ballesta, Thank you. Each csv file contain a record / rows contain one or many LO, UP values. But in the final file I want to use one LO, UP and rest of the LO UP value will add into the next row with its A_ID and B_ID. Like, 1st row of PV.csv is written as it is because it has only one LO, and UP. but 2nd row of PV.csv is broken into 3 rows. You an see it in row 10 to 13 in finalFile.csv. In addition, when use 2nd LO, the value shoud be previous UP+1. — Tofazzal, Dec 13 '15 at 02:51

score 1 · Answer 1 · answered Dec 12 '15 at 17:59

1

You should use python builtin csv module.

To create the final csv file you can do like this. Read through each file, add the new column value to every row and write it to the new file

import csv

with open('finalcsv.csv', 'w') as outcsv:
    writer = csv.writer(outcsv)
    writer.writerow(['a','b','c','etc','Var']) # write final headers

    for filename in ['PV.csv','Dwel.csv','Sess.csv','Elap.csv']:
        with open(filename) as incsv:
            val = filename.split('.csv')[0]
            reader = csv.reader(incsv) # create reader object
            reader.next() # skip the headers

            for row in reader:
                writer.writerow(row+[val])

answered Dec 12 '15 at 17:59

Termi

641
7
14

Hello Anil, Thank you. It working fine to join the data of 4 csv files into one finacsv file. Before join, i would like to add a new column "var" after column b in each csv file so that I can understand which file porduce the records. And then we can join / merge 4 files. Finally I would like to rearrange the LO, UP values as the result shows. – Tofazzal Dec 13 '15 at 02:33
@Tofazzal then you should use csv module's `DictReader` and `DictWriter` classes. – Termi Dec 13 '15 at 14:38

score 0 · Answer 2 · answered Dec 12 '15 at 17:34

There's a standard library module for these manipulations https://docs.python.org/2/library/csv.html#module-csv

Not a full answer by any means, but your full implementation will almost certainly start there. The python docs above include several working examples which will get you started.

Martin Evans · Accepted Answer · 2015-12-15T07:23:28.133

The following script should get you started:

from collections import defaultdict
from itertools import groupby
import csv

entries = defaultdict(list)
csv_files = [(0, 'PV.csv', 'PV'), (1, 'Dwell.csv', 'Dwel'), (2, 'Session.csv', 'Sess'), (3, 'Elapsed.csv', 'Elap')]

for index, filename, shortname in csv_files:
    f_input = open(filename, 'rb')
    csv_input = csv.reader(f_input)
    header = next(csv_input)

    for row in csv_input:
        row[:] = [col for col in row if col]    
        entries[(row[0], row[1])].append((index, shortname, row[2:]))

    f_input.close()

f_output = open('finalFile.csv', 'wb')
csv_output = csv.writer(f_output)
csv_output.writerow(header[:2] + ['Var'] + header[2:4])

for key in sorted(entries.keys()):
    for k, g in groupby(sorted(entries[key]), key=lambda x: x[1]):
        var_group = list(g)
        if len(var_group[0][2]):
            up = var_group[0][2][0]
            for entry in var_group:
                for pair in zip(*[iter(entry[2])]*2):
                    csv_output.writerow([key[0], key[1], entry[1], up, pair[1]])
                    up = int(pair[1]) + 1

f_output.close()

Using the data you have provided, this gives the following output:

A_ID,B_ID,Var,LO,UP
103,321,PV,0,402
103,321,Dwel,40,250
103,321,Dwel,251,780
103,321,Sess,75,350
103,321,Sess,351,850
103,321,Elap,5,35
103,503,PV,192,225
103,503,PV,226,608
103,503,Dwel,80,125
103,503,Dwel,126,585
103,503,Elap,100,225
103,503,Elap,226,408
106,264,PV,104,258
106,264,PV,259,408
106,264,Dwel,192,525
106,264,Sess,92,225
106,264,Sess,226,608
106,264,Elap,102,325
107,197,PV,6,32
107,197,PV,33,258
107,197,Dwel,324,492
107,197,Dwel,493,614
107,197,Sess,24,92
107,197,Elap,24,92
107,197,Elap,93,214

To work with all csv files in a folder, you could add the following to the top of the script:

import os
import glob

csv_files = [(index, file, os.path.splitext(file)[0]) for index, file in enumerate(glob.glob('*.csv'))]

You should also change the location of the output file otherwise it will be read in the next time the script is run.

Tested using Python 2.6.6 (which I believe is what the OP is using)

Evans, Excellent. It is working fine here also. A simple job but most complex work I find is to add +1 into the previous UP value and then use it as the next LO value in the next row for same A_ID, B_ID, Var, as shown fomr 2nd rows in the finalfile.csv. Is there any pythonic way for this task. — Tofazzal, Dec 14 '15 at 11:30
Evans, Thank you. I have added the original data in the following url’s. Could you please take a look on the files. Presently it shows a simple error as following Traceback (most recent call last): File "dataForTable1Final.py", line 26, in up = var_group[0][2][0] IndexError: list index out of range — Tofazzal, Dec 15 '15 at 02:22
Data of Dwell: http://0bin.net/paste/sr-0kAoxef16mfsI#jiA00zvw6OIWqgskG3WcSAmllcEo7Y0kHxiLAsjk0pX Data of Elapsed: http://0bin.net/paste/KjXM+tIZh+f7m3HG#3pJ3HG9C4lkrCI5bYGUnNs4LgC4iHmR3+zDMFRm7QTr Data of PV: http://0bin.net/paste/IZICqFOeVFIzcfFx#xOlmyNGpOah02yJfA48FNv9SnpDwgunQOoqqiM9hXzS — Tofazzal, Dec 15 '15 at 02:23
Yes, some records have no data in the 3rd column, i.e., in the first 'lower' column. It contains data only in 1st and 2nd columns. So, I need to delete the records thows are empty in the 3rd column (lower). Please take a look on this. And if possible want to use loop to read and manipulate all the files that contain "*.csv" extension. I will add * star marks on the file before executing the program. — Tofazzal, Dec 15 '15 at 06:54
Hello Evans, I would like to inform you that I have posted a new question. If you have a time please take a look in the following link. It is based on the previous question, but I think very complex level manipulation of data (creating combination / patterns between two variables). Here is the link: http://stackoverflow.com/questions/34309176/create-different-combination-patterns-between-the-data-of-two-columns-of-a-csv — Tofazzal, Dec 16 '15 at 10:16

Add new column in a csv file and manipulate on the on records

3 Answers3

Linked