-1

I need to fill null values in the column with not null value of the same group.

Example

Desired Outcome

I tried using transform with mode, but it didn't do the job.

test['col2']=test['col2'].transform(lambda x:x.fillna(x.mode())
Amistreem
  • 1
  • 1

2 Answers2

2

Use GroupBy.transform with mode and select first value if exist, else None, last pass to Series.fillna:

s = df.groupby('col1')['col2'].transform(lambda x: next(iter(x.mode()), None))
df['col2'] = df['col2'].fillna(s)
print (df)
  col1   col2
0  gr1  test1
1  gr2  test2
2  gr1  test1
3  gr1  test1
4  gr2  test2
5  gr3  test3
6  gr2  test2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    I think this answer would be more performant on a large dataset because you are iterating through the groups instead of the entire dataframe. Pretty neat! – Zach Flanders Jun 24 '22 at 12:25
0

I would use .assign and .apply to go through each row and then find the mode:

import pandas
import numpy

df = pandas.DataFrame({
    'col1':['gr1', 'gr2', 'gr1', 'gr1', 'gr2', 'gr3', 'gr2', numpy.nan], 
    'col2':['test1', 'test2', 'test', numpy.nan, numpy.nan, 'test3', numpy.nan, numpy.nan],
})

def fill_value(x):
    if x['col2'] is numpy.nan:
        mode = df.loc[df['col1'] == x['col1'], 'col2'].mode()
        default = numpy.nan
        return mode.iloc[0] if not mode.empty else default
    else:
        return x['col2']
    
df = df.assign(col2=df.apply(fill_value, axis=1))

output:

  col1   col2
0  gr1  test1
1  gr2  test2
2  gr1   test
3  gr1   test
4  gr2  test2
5  gr3  test3
6  gr2  test2
7  NaN    NaN
Zach Flanders
  • 1,224
  • 1
  • 7
  • 10
  • Thanks for the reply. This works on this example, however when I tried using this method on real Data Frame, I receive an error "single positional indexer is out-of-bounds". – Amistreem Jun 24 '22 at 11:46
  • This most likely means that you have a nan value in col2 that is also nan in col1. I updated the code to check that there is a mode returned, else you can fill with a default value. – Zach Flanders Jun 24 '22 at 11:52
  • @Amistreem - Another solution working? – jezrael Jun 24 '22 at 12:03
  • It does. I also tried your approach and it works too... Thank you. – Amistreem Jun 24 '22 at 12:09
  • @Amistreem - only one answer should be accepted, not both. Problem is in some group is only only missing values in `col2`, so raise error `single positional indexer is out-of-bounds` if not use `next` + `iter` in my siolution for prevent this error. – jezrael Jun 24 '22 at 12:15