0

Issue

In process of replacing null values so column is boolean, we find null values in fireplace_count column.

If fireplaceflag value is False the fireplace_count null value should be replaced with 0

written for pandas

df_train.loc[(df_train.fireplace_count.isnull()) & (df_train.fireplaceflag==False),'fireplace_count'] = 0
gumdropsteve
  • 70
  • 1
  • 14

3 Answers3

1

I suggest using df.fillna() and putting the column name in the method to target it, like:

df['<column_name>']=df.<columnname>.fillna(<new_value>)

You would put the new value that you want to change the null values into in the parenthesis. In your case, this is "0". Let's also simplify the problem, as it seems the condition for a None value is if there is a False flag.

I'm going to use the Series that you sent me earlier, with one minor change.

import cudf
df = cudf.DataFrame({'basement_flag': [1, 1, 1, 0],
                     'basementsqft': [400,750,500,0],
                     'fireplace_count': [2, None, None, 1], #<-- added a None to illustrate the targeted nature of the solution
                     'fireplaceflag': [10, None, None, 8]})
print(df)
df['fireplace_count']=df.fireplace_count.fillna(0) #<-- This is the solution.  It changes only the values in the column of interest, which is what you explained that you needed
print(df)

Output would be:

   basement_flag  basementsqft  fireplace_count  fireplaceflag
0              1           400                2             10
1              1           750                                
2              1           500                                
3              0             0                1              8
   basement_flag  basementsqft  fireplace_count  fireplaceflag
0              1           400                2             10
1              1           750                0               
2              1           500                0               
3              0             0                1              8

there is also...

df['fireplace_count'] = df['fireplace_count'].fillna(0)
df['fireplaceflag']= df['fireplaceflag'].fillna(-1)
df['fireplaceflag'] = df['fireplaceflag'].masked_assign(1, df['fireplace_count'] > 0)

That should work for any weird cases based on what i think your question is (Thanks Roy F @ NVIDIA)

Let me know if this works for you, or if you need more help!

TaureanDyerNV
  • 1,208
  • 8
  • 9
  • This works! Filling adjusting the independent column with `fillna` then adjusting the dependent column with `masked_assign` got it right. Thank you. (and big thanks to Roy!) – gumdropsteve Jul 23 '19 at 23:53
  • Additionally, if wanting to consider multiple columns --e.g. when `poolcnt=1` and `has_hottub_or_spa=1` and `just_hottub_or_spa` is null then `just_hottub_or_spa =0` -- the following seems to work; ```df_train['just_hottub_or_spa'] = df_train['just_hottub_or_spa'].masked_assign(0, (df_train['poolcnt'] == 1) & (df_train['has_hottub_or_spa'] == 1) & (df_train['just_hottub_or_spa'].isna() == True))``` – gumdropsteve Jul 25 '19 at 00:55
0

what we're trying to do

In rows where the value in the fireplaceflag column is False (i.e. there is no fireplace), change the null values in the fireplace_count column to 0

pandas code from initial question

df_train.loc[(df_train.fireplace_count.isnull()) & (df_train.fireplaceflag==False),'fireplace_count'] = 0

translated to cudf

df_train['fireplace_count'] = df_train['fireplace_count'].masked_assign(0, (df_train['fireplace_count'].isna() == True) & (df_train['fireplaceflag'] == False))
gumdropsteve
  • 70
  • 1
  • 14
0

The accepted answer of using fillna works for this specific example, but the generalized version in the answer will not work for the question in the title as of cuDF 0.9.

cuDF now supports the __setitem__() method. The generalized scenario of

"In rows where the value in column_a is X, set the value in column_b to Y", is best done with something like the following:

import cudf
df = cudf.DataFrame({'basement_flag': [1, 1, 1, 0],
                     'basementsqft': [400,750,500,0],
                     'fireplace_count': [2, None, None, 1], #<-- added a None to illustrate the targeted nature of the solution
                     'fireplaceflag': [10, None, None, 8]})
print(df)
​
mask = df.fireplaceflag.isnull()
df.loc[mask, 'fireplace_count'] = 0
print(df)
   basement_flag  basementsqft fireplace_count fireplaceflag
0              1           400               2            10
1              1           750            null          null
2              1           500            null          null
3              0             0               1             8
   basement_flag  basementsqft  fireplace_count fireplaceflag
0              1           400                2            10
1              1           750                0          null
2              1           500                0          null
3              0             0                1             8
Nick Becker
  • 4,059
  • 13
  • 19