0

I have created a dataframe that looks as follow:

df= 
        id      var0    var1    var2    var3    var4 ...  var137
        5171    10.0    2.8     0.0     5.0     1.0  ...  9.4  
        5171    40.9    2.5     3.4     4.5     1.3  ...  7.7  
        5171    60.7    3.1     5.2     6.6     3.4  ...  1.0
        ...
        5171    0.5     1.3     5.1     0.5     0.2  ...  0.4
        4567    1.5     2.0     1.0     4.5     0.1  ...  0.4  
        4567    4.4     2.0     1.3     6.4     0.1  ...  3.3  
        4567    6.3     3.0     1.5     7.6     1.6  ...  1.6
        ...
        4567    0.7     1.4     1.4     0.3     4.2  ...  1.7
       ... 
        9584    0.3     2.6     0.0     5.2     1.6  ...  9.7  
        9584    0.5     1.2     8.3     3.4     1.3  ...  1.7  
        9584    0.7     3.0     5.6     6.6     3.0  ...  1.0
        ...
        9584    0.7     1.3     0.1     0.0     2.0  ...  1.7

There are 58 items of id level. I need to add a new level, lets call id uniq_id to this dataframe so that the final result be as follow:

df= 
  uniq_id      id      var0    var1    var2    var3    var4 ...  var137
    0          5171    10.0    2.8     0.0     5.0     1.0  ...  9.4  
    1          5171    40.9    2.5     3.4     4.5     1.3  ...  7.7  
    2          5171    60.7    3.1     5.2     6.6     3.4  ...  1.0
   ...
   57          5171    0.5     1.3     5.1     0.5     0.2  ...  0.4
    0          4567    1.5     2.0     1.0     4.5     0.1  ...  0.4  
    1          4567    4.4     2.0     1.3     6.4     0.1  ...  3.3  
    2          4567    6.3     3.0     1.5     7.6     1.6  ...  1.6
    ...
   57          4567    0.7     1.4     1.4     0.3     4.2  ...  1.7
    ... 
    0          9584    0.3     2.6     0.0     5.2     1.6  ...  9.7  
    1          9584    0.5     1.2     8.3     3.4     1.3  ...  1.7  
    2          9584    0.7     3.0     5.6     6.6     3.0  ...  1.0
    ...
    57         9584    0.7     1.3     0.1     0.0     2.0  ...  1.7

I tried this:

n_t = range(0,58)
pd.concat([df], keys=n, names=['uniq_id'])

but this adds 0 to all values fot uniq_id. I also tried to create an empty multiindex based on this post and then get part of dataframe for each id and add it to this multiIndex one but I failed. How can I fix it?

Birish
  • 5,514
  • 5
  • 32
  • 51

1 Answers1

2

You describe a cumulative count:

df['uniq_id'] = df.groupby('id').cumcount() 

You can add it to the index by

df.set_index(['id', 'uniq_id']) # If id was a Series

or if id is already the index:

df.set_index('uniq_id', append=True) # If id was already an Index

which will give you a MultiIndex: Output:

    var0  var1  var2  var3  var4  var137
id   uniq_id                                      
5171 0        10.0   2.8   0.0   5.0   1.0     9.4
     1        40.9   2.5   3.4   4.5   1.3     7.7
     2        60.7   3.1   5.2   6.6   3.4     1.0
     3         0.5   1.3   5.1   0.5   0.2     0.4
4567 0         1.5   2.0   1.0   4.5   0.1     0.4
     1         4.4   2.0   1.3   6.4   0.1     3.3
     2         6.3   3.0   1.5   7.6   1.6     1.6
     3         0.7   1.4   1.4   0.3   4.2     1.7
9584 0         0.3   2.6   0.0   5.2   1.6     9.7
     1         0.5   1.2   8.3   3.4   1.3     1.7
     2         0.7   3.0   5.6   6.6   3.0     1.0
     3         0.7   1.3   0.1   0.0   2.0     1.7
Jondiedoop
  • 3,303
  • 9
  • 24