15

I am creating a new DataFrame named data_day, containing new features, for each day extrapolated from the day-timestamp of a previous DataFrame df.

My new dataframes data_day are 30 independent DataFrames that I need to concatenate/append at the end in a unic dataframe (final_data_day).

The for loop for each day is defined as follow:

num_days=len(list_day)

#list_day= random.sample(list_day,num_days_to_simulate)
data_frame = pd.DataFrame()

for i, day in enumerate(list_day):

    print('*** ',day,' ***')

    data_day=df[df.day==day]
    .....................
    final_data_day = pd.concat()

Hope I was clear. Mine is basically a problem of append/concatenation of data-frames generated in a non-trivial for loop

jpp
  • 159,742
  • 34
  • 281
  • 339
Annalix
  • 470
  • 2
  • 6
  • 17
  • 1
    This is not clear. Since your don't know how to do it, how do you expect we are going to know what you are trying to do without you showing us an example? My advice is to read [mcve] then edit your question accordingly. You will dramatically increase your odds of getting a quality answer. – piRSquared Feb 15 '18 at 15:33
  • Sorry! I was having a look at the Minimal, Complete..when the guys already solved. I am new in Questioning on this platform. I will take into account in the future. – Annalix Feb 15 '18 at 15:59

3 Answers3

21

Pandas concat takes a list of dataframes. If you can generate a list of dataframes with your looping function, once you are finished you can concatenate the list together:

data_day_list = []
for i, day in enumerate(list_day):
    data_day = df[df.day==day]
    data_day_list.append(data_day)
final_data_day = pd.concat(data_day_list)
ah bon
  • 9,293
  • 12
  • 65
  • 148
David Rinck
  • 6,637
  • 4
  • 45
  • 60
  • Lovely! @drinck's solution works amazing. Thanks so much – Annalix Feb 15 '18 at 15:50
  • I used to do "data_day = df[df.day==day]" as well earlier, but found this to be significantly faster: groups = df.groupby("day") and then do data_day = groups.get_group("day") – uhoenig May 30 '21 at 13:52
8

Exhausting a generator is more elegant (if not more efficient) than appending to a list. For example:

def yielder(df, list_day):
    for i, day in enumerate(list_day):
        yield df[df['day'] == day]

final_data_day = pd.concat(list(yielder(df, list_day))
jpp
  • 159,742
  • 34
  • 281
  • 339
4

Appending or concatenating pd.DataFrames is slow. You can use a list in the interim and then create the final pd.DataFrame at the end with pd.DataFrame.from_records() e.g.:

interim_list = []
for i,(k,g) in enumerate(df.groupby(['[*name of your date column here*'])):
    if i % 1000 == 0 and i != 0:
        print('iteration: {}'.format(i)) # just tells you where you are in iteration
    # add your "new features" here...
    for v in g.values:
        interim_list.append(v)

# here you want to specify the resulting df's column list...
df_final = pd.DataFrame.from_records(interim_list,columns=['a','list','of','columns'])
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223