Set variable column values to nan based on row condition

Question

I want to be able to variably change a column value based on the value of the first column.

Say I have a dataframe as follows:

col_ind   col_1   col_2   col_3
    3       a       b       c
    2       d       e       f
    1       g       h       i

I effectively want to do

df.loc[:, df.columns[-df['col_ind']:]] = np.nan

Which would result in:

col_ind   col_1   col_2   col_3
    3      nan     nan     nan
    2       d      nan     nan
    1       g       h      nan

score 5 · Accepted Answer · answered Mar 09 '23 at 05:30

Lets use broadcasting to check the indices which can be masked

c = df.columns[1:]
m = range(len(c), 0, -1) <= df['col_ind'].values[:, None]

df[c] = df[c].mask(m)

Result

   col_ind col_1 col_2 col_3
0        3   NaN   NaN   NaN
1        2     d   NaN   NaN
2        1     g     h   NaN

score 1 · Answer 2 · answered Mar 09 '23 at 05:07

1

You can get the values of df["col_ind"], iterate through them and set the slice to np.nan:

vals = df["col_ind"].values
for i, v in enumerate(vals):
    df.iloc[i, -v:] = np.nan

answered Mar 09 '23 at 05:07

Marcelo Paco

2,732
4
9
26

1

Yeah this works but I was hoping for a more efficient solution. It's pretty time costly. Thank you though – thefrollickingnerd Mar 09 '23 at 05:15

score 1 · Answer 3 · answered Mar 09 '23 at 05:10

1

You an use apply with result_type='broadcast'. (Edit: borrowing @marcelo-paco's code)

def make_nan(row):
    row[-row[0]:] = np.nan
    return row

df = pd.DataFrame({'col_ind': [3, 2, 1], 'col_1': ['a', 'd', 'g'], 'col_2': ['b', 'e', 'h'], 'col_3': ['c', 'f', 'i']})
df[:] = df.apply(make_nan, axis=1, result_type='broadcast')
df

This will give:

col_ind col_1   col_2   col_3
    3   NaN      NaN    NaN
    2   d        NaN    NaN
    1   g         h     NaN

answered Mar 09 '23 at 05:10

rajendra

472
3
18

Is `row[-row[0]:]' pseudo code? Do you not need to specify iloc? – thefrollickingnerd Mar 09 '23 at 05:22
It's not a pseudo code. It is using the first element to index the rest of the elements. – rajendra Mar 09 '23 at 05:23
This method crashes the kernal of my notebook. Also in my code I want to use a named column, I've replaced 0 with 'col name', I assume that is fine? – thefrollickingnerd Mar 09 '23 at 05:25
This assumes the values in col_ind are between 1 and 3. Not sure what you mean by "replaced 0 with 'col name'. You can make an edit and update your question. – rajendra Mar 09 '23 at 05:28
I used a mock df for my example but my actual df is larger, so instead of using the column index for the condition column, I used the column name. Presumably the same syntax. – thefrollickingnerd Mar 09 '23 at 05:34

score 1 · Answer 4 · answered Mar 09 '23 at 05:15

1

You could create new columns with slices of the current columns and then replace

for i, cn in enumerate(df.columns,1): 
    df[cn] = [*[np.nan]*i, *df[cn].loc[i:]]

answered Mar 09 '23 at 05:15

Driftr95

4,572
2
9
21

Set variable column values to nan based on row condition

4 Answers4