Python Pandas add Filename Column CSV

Question

My python code works correctly in the below example. My code combines a directory of CSV files and matches the headers. However, I want to take it a step further - how do I add a column that appends the filename of the CSV that was used?

import pandas as pd
import glob

globbed_files = glob.glob("*.csv") #creates a list of all csv files

data = [] # pd.concat takes a list of dataframes as an agrument
for csv in globbed_files:
    frame = pd.read_csv(csv)
    data.append(frame)

bigframe = pd.concat(data, ignore_index=True) #dont want pandas to try an align row indexes
bigframe.to_csv("Pandas_output2.csv")

score 43 · Accepted Answer · answered Jan 25 '17 at 17:19

43

This should work:

import os

for csv in globbed_files:
    frame = pd.read_csv(csv)
    frame['filename'] = os.path.basename(csv)
    data.append(frame)

frame['filename'] creates a new column named filename and os.path.basename() turns a path like /a/d/c.txt into the filename c.txt.

answered Jan 25 '17 at 17:19

Mike Müller

82,630
20
166
161

Awesome. I knew it was easy! Thank you – specmer Jan 25 '17 at 17:33

score 0 · Answer 2 · answered Apr 29 '19 at 17:14

Mike's answer above works perfectly. In case any googlers run into the following error:

>>> TypeError: cannot concatenate object of type "<type 'str'>"; 
    only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

It's possibly because the separator is not correct. I was using a custom csv file so the separator was ^. Becuase of that I needed to include the separator in the pd.read_csv call.

import os

for csv in globbed_files:
    frame = pd.read_csv(csv, sep='^')
    frame['filename'] = os.path.basename(csv)
    data.append(frame)

score 0 · Answer 3 · edited Apr 27 '22 at 18:01

0

files variable contains all list of csv files in your present directory. Such as ['FileName1.csv',FileName2.csv']. You also need to remove ".csv". You can use .split() function. Below is simple logic. This will work for you.

files = glob.glob("*.csv") 
for i in files:
    
    df=pd.read_csv(i)
    df['New Column name'] = i.split(".")[0]
    df.to_csv(i.split(".")[0]+".csv")

edited Apr 27 '22 at 18:01

Emi OB

2,814
3
13
29

answered Apr 22 '22 at 10:22

Himalay Parmar

1
1

Python Pandas add Filename Column CSV

3 Answers3

Linked