3

I have a dataframe like this

  name data result 
0  x    100
1  x    100
2  x    100
3  x    100
4  x    100
5  y    100
6  y    90
7  y    90
8  y    100
9  y    85

I want to check whether each group in the name column have the same value in the data column.

So for each x group, if the corresponding data value are all equal, write full in the result column. If the values for a group not are all equal, write nearly in the result column.

I have tried grouping the dataframe:

dfx = df.groupby('name')
dfx = dfa.get_group('x')

but it doesn't really help in checking if each value is the same, write in the result column.

I have tried creating a function that will check for unique values

def check_identicals(row):
    if(df.sent.nunique() == 1):
        print('Full')

The idea here is to then apply that function to each row and write the output in the result column.

Ideal output:

   name data result 
0  x    100   full
1  x    100   full
2  x    100   full
3  x    100   full
4  x    100   full
5  y    100   nearly
6  y    90    nearly
7  y    90    nearly
8  y    100   nearly
9  y    85    nearly
lczapski
  • 4,026
  • 3
  • 16
  • 32
Mazz
  • 770
  • 3
  • 11
  • 23
  • Possible duplicate of [Check if all elements in a group are equal using pandas GroupBy](https://stackoverflow.com/questions/53950883/check-if-all-elements-in-a-group-are-equal-using-pandas-groupby) – Yuca Nov 22 '19 at 12:38

1 Answers1

3

Use numpy.where with GroupBy.transform and DataFrameGroupBy.nunique for compare all values in new Series with same size like original DataFrame:

df['result'] = np.where(df.groupby('name')['data'].transform('nunique') == 1,'full','nearly')
print (df)
  name  data  result
0    x   100    full
1    x   100    full
2    x   100    full
3    x   100    full
4    x   100    full
5    y   100  nearly
6    y    90  nearly
7    y    90  nearly
8    y   100  nearly
9    y    85  nearly

EDIT:

For test if all missing values per groups use numpy.select with another condition with compare mising values with transform and GroupBy.all:

m1 = df.groupby('name')['data'].transform('nunique') == 1
m2 = df['data'].isna().groupby(df['name']).transform('all')

df['result'] = np.select([m1, m2], ['full', 'all_missing'],'nearly')
print (df)
  name   data       result
0    x  100.0         full
1    x  100.0         full
2    x  100.0         full
3    x  100.0         full
4    x  100.0         full
5    y  100.0       nearly
6    y   90.0       nearly
7    y   90.0       nearly
8    z    NaN  all_missing
9    z    NaN  all_missing
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Is there a way to include a isnull() condition? Say if each data value for a distinct group is Null, indicate that in the result column? – Mazz Nov 22 '19 at 12:55