1

I encountered a problem of how to duplicate rows with loop function in Python. I have a dataset like this(this is a pandas data frame):

    userId  period  Date    
0   41851   4       1/4/2015    
1   13575   1       1/4/2015    

And I want to duplicate the first row 3 times, every time for the original row, the period column need to minus 1. until the period for the original is 1. and also every time when I duplicate it, I want to add 1 month to the date. So the result would be like this:

    userId  period  Date    
0   41851   1       1/4/2015
1   41851   1       2/4/2015
2   41851   1       3/4/2015
3   41851   1       4/4/2015    
4   13575   1       1/4/2015    

Does someone know how to do that? Thanks!

Errol
  • 600
  • 4
  • 16
Stella
  • 65
  • 4

1 Answers1

1

Idea is repeat rows by Index.repeat and DataFrame.loc, then add days by GroupBy.cumcount with this solution and last if necessary change format of datetimes by Series.dt.strftime:

def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,
              seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
    years = np.asarray(years) - 1970
    months = np.asarray(months) - 1
    days = np.asarray(days) - 1
    types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
             '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
    vals = (years, months, days, weeks, hours, minutes, seconds,
            milliseconds, microseconds, nanoseconds)
    return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
               if v is not None)

def year(dates):
    "Return an array of the years given an array of datetime64s"
    return dates.astype('M8[Y]').astype('i8') + 1970

def month(dates):
    "Return an array of the months given an array of datetime64s"
    return dates.astype('M8[M]').astype('i8') % 12 + 1

def day(dates):
    "Return an array of the days of the month given an array of datetime64s"
    return (dates - dates.astype('M8[M]')) / np.timedelta64(1, 'D') + 1

df['Date'] = pd.to_datetime(df['Date'])

df1 = df.loc[df.index.repeat(df['period'])]
g = df1.groupby(level=0).cumcount()

start = df1['Date'].values
df1['Date'] = combine64(year(start), months=month(start) + g, 
                           days=day(start))
df1['period'] = 1
df1 = df1.reset_index(drop=True)

df1['Date'] = df1['Date'].dt.strftime('%m/%d/%Y')
print (df1)
   userId  period        Date
0   41851       1  01/04/2015
1   41851       1  02/04/2015
2   41851       1  03/04/2015
3   41851       1  04/04/2015
4   13575       1  01/04/2015
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252