Need to pick 'second column' from multiple csv files and save all 'second columns' in one csv file

Question

So I have 366 CSV files and I want to copy their second columns and write them into a new CSV file. Need a code for this job. I tried some codes available here but nothing works. please help.

Please update this question to provide the work demonstrating your effort so users can assist you in ironing out the bugs leading to your failure. You'll have much better luck finding assistance since this community isn't for requesting people do your work for you. The work you provide should not only demonstrate the attempts you've made, but also clearly describe the failure you need help overcoming. — Julian, Oct 19 '19 at 00:05

score 1 · Answer 1 · answered Oct 18 '19 at 23:57

1

Assuming all the 2nd columns are the same length, you could simply loop through all the files. Read them, save the 2nd column to memory and construct a new df along the way.

filenames = ['test.csv', ....]

new_df = pd.DataFrame()

for filename in filenames:
    df = pd.read_csv(filename)
    second_column = df.iloc[:, 1]
    new_df[f'SECOND_COLUMN_{filename.upper()}'] = second_column
    del(df)

new_df.to_csv('new_csv.csv', index=False)

answered Oct 18 '19 at 23:57

Dominik Sajovic

603
1
8
16

Hey Dominik, Thanks for sharing the code. Tweaked it for all files in the folder; works fine. – Ali Ajaz Oct 20 '19 at 02:41
Hi Ali, I would greatly appreciate an upvote to the answer if I have been of help, thank you. :) – Dominik Sajovic Oct 20 '19 at 20:34
I will for sure once I get >15 reputation points. Wont forget. Thanks Again. – Ali Ajaz Oct 20 '19 at 23:16

Ian-Fogelman · Answer 2 · 2019-10-25T13:53:37.627

1

This can accomplished with glob and pandas:

import glob
import pandas as pd

mylist = [f for f in glob.glob("*.csv")]
df = pd.read_csv(mylist[0]) #create the dataframe from the first csv
df = pd.DataFrame(df.iloc[:,1]) #only keep 2nd column
for x in mylist[1:]: #loop through the rest of the csv files doing the same
    t = pd.read_csv(x)
    colName = pd.DataFrame(t.iloc[:,1]).columns
    df[colName] = pd.DataFrame(t.iloc[:,1])
    df.to_csv('output.csv', index=False)

edited Oct 25 '19 at 13:53

answered Oct 19 '19 at 00:08

Ian-Fogelman

1,595
1
9
15

thanks for sharing the code. there was a mismatch between csvList and mylist. however, it just runs for the very first file. – Ali Ajaz Oct 19 '19 at 21:12
Please mark correct if this solution helped you with your issue – Ian-Fogelman Oct 21 '19 at 00:13

score 1 · Accepted Answer · answered Oct 20 '19 at 02:39

1

    filenames = glob.glob(r'D:/CSV_FOLDER' + "/*.csv")

    new_df = pd.DataFrame()

    for filename in filenames:
        df = pd.read_csv(filename)
        second_column = df.iloc[:, 1]
        new_df[f'SECOND_COLUMN_{filename.upper()}'] = second_column
        del(df)

    new_df.to_csv('new_csv.csv', index=False)

answered Oct 20 '19 at 02:39

Ali Ajaz

59
8

So I tried this code for multiple files in one folder and it works perfectly fine. – Ali Ajaz Oct 20 '19 at 02:40

score 0 · Answer 4 · answered Oct 19 '19 at 21:15

    import glob
    import pandas as pd

    mylist = [f for f in glob.glob("*.csv")]
    df = pd.read_csv(csvList[0]) #create the dataframe from the first csv
    df = pd.DataFrame(df.iloc[:,0]) #only keep 2nd column
    for x in mylist[1:]: #loop through the rest of the csv files doing the same
        t = pd.read_csv(x)
        colName = pd.DataFrame(t.iloc[:,0]).columns
        df[colName] = pd.DataFrame(t.iloc[:,0])
        df.to_csv('output.csv', index=False)

Need to pick 'second column' from multiple csv files and save all 'second columns' in one csv file

4 Answers4