Following on from this question, I have this dataframe:
ChildID MotherID preWeight
0 20 455 3500
1 20 455 4040
2 13 102 NaN
3 702 946 5000
4 82 571 2000
5 82 571 3500
6 82 571 3800
where I transformed feature 'preWeight' that has multiple observations per MotherID to feature 'preMacro' with a single observation per MotherID, based on the following rules:
- if preWeight>=4000 for a particular MotherID, I assigned preMacro a value of "Yes" regardless of the remaining observations
- Otherwise I assigned preMacro a value of "No"
Using this line of code:
df.groupby(['ChildID','MotherID']).agg(lambda x: 'Yes' if (x>4000).any() else 'No').reset_index().rename(columns={"preWeight": "preMacro"})
However, I realised that this way I am not preserving the NaN values in the dataset, which ideally should be imputed rather than just assigning them "No" values. So I tried changing the above line to:
df=df.groupby(['MotherID', 'ChildID'])['preWeight'].agg(
lambda x: 'Yes' if (x>4000).any() else (np.NaN if 'no_value' in x.values.all() else 'No')).reset_index().rename(
columns={"preWeight": "preMacro"})
I wanted this line to transform the above dataframe to this:
ChildID MotherID preMacro
0 20 455 Yes
1 13 102 NaN
2 702 946 Yes
3 82 571 No
However I got this error when running it:
TypeError: argument of type 'float' is not iterable
I understand that, in the case of non-missing values, the values of x.values.all() are float numbers, which are not iterable, but I am not sure how else to code this, any ideas?
Thanks.