2

I have this dataset:

    menu    alternative id  varA    varB    varC
1   NaN A   NaN NaN NaN
1   NaN A   NaN NaN NaN
2   94  A   8.089481019 7.07639559  0.90627215
2   89  A   7.52310322  19.49894193 14.4562262
3   79  A   24.79634962 18.91163612 23.85341972
3   95  A   21.10990397 17.00630516 1.09875582
4   47  A   5.681766806 4.136047755 17.38880496
4   62  A   10.39459876 0.997853805 0.045331687
5   58  A   11.91790497 5.696799013 27.21424163
5   23  A   11.71107828 2.165751058 11.56534045
6   57  A   1.068603487 27.95362014 1.334049372
6   100 A   26.31848796 6.757305213 4.742282633
7   NaN A   NaN NaN NaN
7   NaN A   NaN NaN NaN
8   24  A   29.99608877 28.49057834 0.14073638
8   7   A   8.749041949 14.17745528 9.604565417
9   64  A   29.4316969  19.57593592 9.174503643
9   60  A   13.53995541 1.898164567 16.49089291
10  85  A   20.1394155  0.995839592 16.18638727
10  22  A   22.68625486 14.26052953 17.79707308
1   82  B   21.34428845 9.901326583 1.053134597
1   91  B   19.04689216 16.29217346 29.56962312
2   35  B   25.44168095 29.00407645 2.246459981
2   100 B   15.79687903 20.37920541 28.45071525
3   44  B   7.359501131 23.66924419 7.198215907
3   41  B   22.65272801 8.66227065  12.05186217
4   59  B   26.67565422 9.608511948 26.45016581
4   53  B   5.64870847  21.83063691 19.20105218
5   48  B   5.591317152 25.17616679 24.30522374
5   16  B   23.85069753 23.12154586 0.781450997
6   NaN B   NaN NaN NaN
6   NaN B   NaN NaN NaN
7   68  B   9.334935288 16.39114327 21.17696541
7   41  B   5.841577934 6.901223007 28.38116983
8   35  B   21.20288984 9.665414964 4.472546438
8   96  B   0.451299457 27.66880932 26.2120144
9   84  B   19.67310555 1.993071082 9.08442779
9   65  B   0.475983889 16.72261394 17.17122898
10  40  B   9.553130945 17.88616649 22.17570401
10  40  B   19.70487161 5.898428653 11.25844279
1   19  C   20.47792809 9.344376127 7.855311112
1   59  C   14.59141273 8.090534362 19.6972446
2   19  C   6.624345353 0.192145343 26.31356322
2   67  C   24.483236   6.718856437 25.75609679
3   67  C   27.6408808  24.91014602 25.90758755
3   30  C   26.52738124 10.78363589 4.873602089
4   14  C   3.776964641 21.16561036 24.03153234
4   46  C   16.53719818 23.86634958 25.61504006
5   NaN C   NaN NaN NaN
5   NaN C   NaN NaN NaN
6   58  C   28.1357636  15.89359176 0.567406646
6   28  C   0.708229201 12.20641988 0.309303591
7   17  C   4.319991082 21.29233667 3.516184987
7   8   C   24.09490443 9.507000131 14.93472971
8   85  C   19.99606403 21.61509867 0.161222766
8   5   C   6.056082264 25.35186187 5.375641692
9   24  C   19.83904205 24.54037422 11.08571464
9   13  C   4.388769239 7.928106767 4.279531285
10  78  C   13.67598922 5.3140143   15.2710129
10  13  C   12.27642791 16.04610858 1.815260029

As you can see here, I have some null values which I need to fill. However, I need to fill them in a somewhat custom manner. For every id and for every menu I need to fill the null values based on random selection among the same menus (same menu number) in different ids which have non-null values.

Example. The menu 1 in id A has null values. I want to randomly select menu 1 in different id which has non-null values and fill them there. Let it be, id B and menu 1. For menu 7 in id A let it be menu 7 in id C and etc.

It is somehow similar to this question but iin my case, the filling should happen within the same "subgroups" if we can say so.

The final output should be something like this:

menu    alternative id  varA    varB    varC
1   82  A   21.34428845 9.901326583 1.053134597
1   91  A   19.04689216 16.29217346 29.56962312
2   94  A   8.089481019 7.07639559  0.90627215
2   89  A   7.52310322  19.49894193 14.4562262
3   79  A   24.79634962 18.91163612 23.85341972
3   95  A   21.10990397 17.00630516 1.09875582
4   47  A   5.681766806 4.136047755 17.38880496
4   62  A   10.39459876 0.997853805 0.045331687
5   58  A   11.91790497 5.696799013 27.21424163
5   23  A   11.71107828 2.165751058 11.56534045
6   57  A   1.068603487 27.95362014 1.334049372
6   100 A   26.31848796 6.757305213 4.742282633
7   17  A   4.319991082 21.29233667 3.516184987
7   8   A   24.09490443 9.507000131 14.93472971
8   24  A   29.99608877 28.49057834 0.14073638
8   7   A   8.749041949 14.17745528 9.604565417
9   64  A   29.4316969  19.57593592 9.174503643
9   60  A   13.53995541 1.898164567 16.49089291
10  85  A   20.1394155  0.995839592 16.18638727
10  22  A   22.68625486 14.26052953 17.79707308
1   82  B   21.34428845 9.901326583 1.053134597
1   91  B   19.04689216 16.29217346 29.56962312
2   35  B   25.44168095 29.00407645 2.246459981
2   100 B   15.79687903 20.37920541 28.45071525
3   44  B   7.359501131 23.66924419 7.198215907
3   41  B   22.65272801 8.66227065  12.05186217
4   59  B   26.67565422 9.608511948 26.45016581
4   53  B   5.64870847  21.83063691 19.20105218
5   48  B   5.591317152 25.17616679 24.30522374
5   16  B   23.85069753 23.12154586 0.781450997
6   57  B   1.068603487 27.95362014 1.334049372
6   100 B   26.31848796 6.757305213 4.742282633
7   68  B   9.334935288 16.39114327 21.17696541
7   41  B   5.841577934 6.901223007 28.38116983
8   35  B   21.20288984 9.665414964 4.472546438
8   96  B   0.451299457 27.66880932 26.2120144
9   84  B   19.67310555 1.993071082 9.08442779
9   65  B   0.475983889 16.72261394 17.17122898
10  40  B   9.553130945 17.88616649 22.17570401
10  40  B   19.70487161 5.898428653 11.25844279
1   19  C   20.47792809 9.344376127 7.855311112
1   59  C   14.59141273 8.090534362 19.6972446
2   19  C   6.624345353 0.192145343 26.31356322
2   67  C   24.483236   6.718856437 25.75609679
3   67  C   27.6408808  24.91014602 25.90758755
3   30  C   26.52738124 10.78363589 4.873602089
4   14  C   3.776964641 21.16561036 24.03153234
4   46  C   16.53719818 23.86634958 25.61504006
5   48  C   5.591317152 25.17616679 24.30522374
5   16  C   23.85069753 23.12154586 0.781450997
6   58  C   28.1357636  15.89359176 0.567406646
6   28  C   0.708229201 12.20641988 0.309303591
7   17  C   4.319991082 21.29233667 3.516184987
7   8   C   24.09490443 9.507000131 14.93472971
8   85  C   19.99606403 21.61509867 0.161222766
8   5   C   6.056082264 25.35186187 5.375641692
9   24  C   19.83904205 24.54037422 11.08571464
9   13  C   4.388769239 7.928106767 4.279531285
10  78  C   13.67598922 5.3140143   15.2710129
10  13  C   12.27642791 16.04610858 1.815260029

Any guidance would be appreciated. Maybe even there is some groupby apply logic which could assist in this.

Emil Mirzayev
  • 242
  • 2
  • 13

1 Answers1

0

You can run fillna() row-wise in apply(), then fill with a random sample from the dataframe filtered by your conditions:

df.apply(lambda row: row.fillna(df[(df['menu'] == row['menu']) & (df['id'] != row['id'])].dropna().sample(n=1).iloc[0]), axis=1) 
RJ Adriaansen
  • 9,131
  • 2
  • 12
  • 26