Concatenate pandas DataFrames generated with a loop

Question

I am creating a new DataFrame named data_day, containing new features, for each day extrapolated from the day-timestamp of a previous DataFrame df.

My new dataframes data_day are 30 independent DataFrames that I need to concatenate/append at the end in a unic dataframe (final_data_day).

The for loop for each day is defined as follow:

num_days=len(list_day)

#list_day= random.sample(list_day,num_days_to_simulate)
data_frame = pd.DataFrame()

for i, day in enumerate(list_day):

    print('*** ',day,' ***')

    data_day=df[df.day==day]
    .....................
    final_data_day = pd.concat()

Hope I was clear. Mine is basically a problem of append/concatenation of data-frames generated in a non-trivial for loop

This is not clear. Since your don't know how to do it, how do you expect we are going to know what you are trying to do without you showing us an example? My advice is to read [mcve] then edit your question accordingly. You will dramatically increase your odds of getting a quality answer. — piRSquared, Feb 15 '18 at 15:33
Sorry! I was having a look at the Minimal, Complete..when the guys already solved. I am new in Questioning on this platform. I will take into account in the future. — Annalix, Feb 15 '18 at 15:59

score 21 · Accepted Answer · edited Dec 26 '21 at 09:54

21

Pandas concat takes a list of dataframes. If you can generate a list of dataframes with your looping function, once you are finished you can concatenate the list together:

data_day_list = []
for i, day in enumerate(list_day):
    data_day = df[df.day==day]
    data_day_list.append(data_day)
final_data_day = pd.concat(data_day_list)

edited Dec 26 '21 at 09:54

ah bon

9,293
12
65
148

answered Feb 15 '18 at 15:38

David Rinck

6,637
4
45
60

Lovely! @drinck's solution works amazing. Thanks so much – Annalix Feb 15 '18 at 15:50
I used to do "data_day = df[df.day==day]" as well earlier, but found this to be significantly faster: groups = df.groupby("day") and then do data_day = groups.get_group("day") – uhoenig May 30 '21 at 13:52

jpp · Answer 2 · 2020-07-15T07:54:22.040

8

Exhausting a generator is more elegant (if not more efficient) than appending to a list. For example:

def yielder(df, list_day):
    for i, day in enumerate(list_day):
        yield df[df['day'] == day]

final_data_day = pd.concat(list(yielder(df, list_day))

edited Jul 15 '20 at 07:54

answered Feb 16 '18 at 02:24

jpp

159,742
34
281
339

score 4 · Answer 3 · answered Feb 15 '18 at 15:42

Appending or concatenating pd.DataFrames is slow. You can use a list in the interim and then create the final pd.DataFrame at the end with pd.DataFrame.from_records() e.g.:

interim_list = []
for i,(k,g) in enumerate(df.groupby(['[*name of your date column here*'])):
    if i % 1000 == 0 and i != 0:
        print('iteration: {}'.format(i)) # just tells you where you are in iteration
    # add your "new features" here...
    for v in g.values:
        interim_list.append(v)

# here you want to specify the resulting df's column list...
df_final = pd.DataFrame.from_records(interim_list,columns=['a','list','of','columns'])

you are fully write. Thanks! ...cannot give two votes on Stackoverflow?? — Annalix, Feb 15 '18 at 15:55

Concatenate pandas DataFrames generated with a loop

3 Answers3