0

I have a dataframe that has column of repeating values/indexes and I want to group it by the 'Name' column but without performing any aggregation to it. I've looked at the Dataframe.groupby() function but from what I've searched, you are kind of forced to perform an aggregation. I've also tried Dataframe.set_index('Name') and then do Dataframe.sort_index(inplace = True) but it for some reason it returns a Nonetype I feel like it's something obvious but I can't quite figure it out.

This is my dataframe now:

Name Data1 Data2
A    .1    1.1
A    .2    1.2
A    .3    1.3
B    .6    1.6
B    .7    1.7
B    .8    1.8
C    1.0   2.0
C    1.1   2.1
C    1.2   2.2

This is what I want:

Name Data1 Data2
A    .1    1.1
     .2    1.2
     .3    1.3
B    .6    1.6
     .7    1.7
     .8    1.8
C    1.0   2.0
     1.1   2.1
     1.2   2.2
Nycbros
  • 95
  • 8

1 Answers1

3

If want replace duplicated values to empty strings use Series.duplicated with Series.mask:

df['Name'] = df['Name'].mask(df['Name'].duplicated(), '')
print (df)
  Name  Data1  Data2
0    A    0.1    1.1
1         0.2    1.2
2         0.3    1.3
3    B    0.6    1.6
4         0.7    1.7
5         0.8    1.8
6    C    1.0    2.0
7         1.1    2.1
8         1.2    2.2

In MultiIndex for first look it also working, but NOT, values are only in default view NOT displayed in first level(s) if duplicated:

df1 = df.set_index(['Name','Data1'])
print (df1)
            Data2
Name Data1       
A    0.1      1.1
     0.2      1.2
     0.3      1.3
B    0.6      1.6
     0.7      1.7
     0.8      1.8
C    1.0      2.0
     1.1      2.1
     1.2      2.2

Check docs, In [21]:, there is no omited data, only moved to MultiIndex:

with pd.option_context('display.multi_sparse', False):
    print (df1)

            Data2
Name Data1       
A    0.1      1.1
A    0.2      1.2
A    0.3      1.3
B    0.6      1.6
B    0.7      1.7
B    0.8      1.8
C    1.0      2.0
C    1.1      2.1
C    1.2      2.2

Or if convert values of MultiIndex to list:

print (df1.index.tolist())
[('A', 0.1), ('A', 0.2), ('A', 0.3), 
 ('B', 0.6), ('B', 0.7), ('B', 0.8), 
 ('C', 1.0), ('C', 1.1), ('C', 1.2)]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Thanks so much for all the options! I was looking for the first one where duplicates gets replaced with empty values. – Nycbros Feb 03 '20 at 07:42