0

I have a dataframe which looks like this:

df = pd.DataFrame([
        [123, 'abc', '121'],
        [123, 'abc', '121'],
        [456, 'def', '121'],
        [123, 'abc', '122'],
        [123, 'abc', '122'],
        [456, 'def', '145'],
        [456, 'def', '145'],
        [456, 'def', '121'],
    ], columns=['userid', 'name', 'dt'])

From this question, I have managed to transpose it.

So, the desired df would be:

userid1_date1  name_1   name_2  ...   name_n
userid1_date2  name_1   name_2  ...   name_n
userid2        name_1   name_2  ...   name_n
userid3_date1  name_1   name_2  ...   name_n

But, I want to seperate the rows depending on the date. For example, is a user 123 has data in two days, then the rows should be seperate for each day's api events.

I wouldn't really be needing the userid after the transformation, so you can use it anyway.

My plan was:

  • Group the df w.r.t the dt column
  • Pivot all the groups such that each looks like this:
    userid1_date1 name_1 name_2 ... name_n
  • Now, concatenate the pivoted data

But, I have no clue how to do this in pandas!

Community
  • 1
  • 1
Dawny33
  • 10,543
  • 21
  • 82
  • 134

1 Answers1

0

Try:

def tweak(df):
    return df.reset_index().name

df.set_index('userid').groupby(level=0).apply(tweak)

Demonstration

df = pd.DataFrame([[1, 'a'], [1, 'c'], [1, 'c'], [1, 'd'], [1, 'e'],
                   [1, 'a'], [1, 'c'], [1, 'c'], [1, 'd'], [1, 'e'],
                   [2, 'a'], [2, 'a'], [2, 'c'], [2, 'd'], [2, 'e'],
                   [2, 'a'], [2, 'a'], [2, 'c'], [2, 'd'], [2, 'e'],
    ], columns=['userid', 'name'])

def tweak(df):
    return df.reset_index().name

df.set_index('userid').groupby(level=0).apply(tweak)

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Wait. You forgot to `.unstack()` it. It works now. But, how does this seperate the user data by `dt`? The df looks similar to the one here: http://stackoverflow.com/a/38369722/4993513 – Dawny33 Aug 02 '16 at 09:08