Change values in a column to np.nan based upon row index

Question

I want to selectively change column values to np.nan.

I have a column with a lot of zero (0) values.

I am getting the row indices of a subset of the total.

I place the indices into a variable (s0).

I then use this to set the column value to np.nan for just the rows whose index is in s0.

It runs, but it is changing every single row (i.e., the entire column) to np.nan.

Here is my code:

print((df3['amount_tsh'] == 0).sum())  # 41639  <-- there are this many zeros to start
# print(df3['amount_tsh'].value_counts()[0])
s0 = df3['amount_tsh'][df3['amount_tsh'].eq(0)].sample(37322).index  #  grab 37322 row indexes
print(len(s0))  # 37322
df3['amount_tsh'] = df3.loc[df3.index.isin(s0), 'amount_tsh'] = np.nan  #  change the value in the column to np.nan if it's index is in s0
print(df3['amount_tsh'].isnull().sum())

wwnde · Accepted Answer · 2022-03-04T21:36:33.577

Lets try

s0 = df3.loc[df3['amount_tsh'].eq(0), ['amount_tsh']].sample(37322)
df3.loc[df3.index.isin(s0.index), 'amount_tsh'] = np.nan

For a quick fix I used this data I had in a notebook and it worked for me

import pandas as pd 
import numpy as np

data = pd.DataFrame({'Symbol': {0: 'ABNB', 1: 'DKNG', 2: 'EXPE', 3: 'MPNGF', 4: 'RDFN', 5: 'ROKU', 6: 'VIACA', 7: 'Z'},
'Number of Buys': {0: np.nan, 1: 2.0, 2: np.nan, 3: 1.0, 4: 2.0, 5: 1.0, 6: 1.0, 7: np.nan}, 
'Number of Sell      s': {0: 1.0, 1: np.nan, 2: 1.0, 3: np.nan, 4: np.nan, 5: np.nan, 6: np.nan, 7: 1.0}, 
'Gains/Losses': {0: 2106.0, 1: -1479.2, 2: 1863.18, 3: -1980.0, 4: -1687.7, 5: -1520.52, 6: -1282.4, 7: 1624.59}, 'Percentage change': {0: 0.0, 1: 2.0, 2: 0.0, 3: 0.0, 4: 1.5, 5: 0.0, 6: 0.0, 7: 0.0}})

rows = ['ABNB','DKNG','EXPE']
data


  Symbol  Number of Buys  Number of Sell      s  Gains/Losses  \
0   ABNB             NaN                    1.0       2106.00   
1   DKNG             2.0                    NaN      -1479.20   
2   EXPE             NaN                    1.0       1863.18   
3  MPNGF             1.0                    NaN      -1980.00   
4   RDFN             2.0                    NaN      -1687.70   
5   ROKU             1.0                    NaN      -1520.52   
6  VIACA             1.0                    NaN      -1282.40   
7      Z             NaN                    1.0       1624.59   

   Percentage change  
0                0.0  
1                2.0  
2                0.0  
3                0.0  
4                1.5  
5                0.0  
6                0.0  
7                0.0

By your approach

(data['Number of Buys']==1.0).sum()
s0= data.loc[(data['Number of Buys']==1.0),['Number of Buys']].sample(2)
data.loc[data.index.isin(s0.index),'Number of Buys'] =np.nan

Symbol  Number of Buys  Number of Sell      s  Gains/Losses  \
0   ABNB             NaN                    1.0       2106.00   
1   DKNG             2.0                    NaN      -1479.20   
2   EXPE             NaN                    1.0       1863.18   
3  MPNGF             1.0                    NaN      -1980.00   
4   RDFN             2.0                    NaN      -1687.70   
5   ROKU             NaN                    NaN      -1520.52   
6  VIACA             NaN                    NaN      -1282.40   
7      Z             NaN                    1.0       1624.59   

   Percentage change  
0                0.0  
1                2.0  
2                0.0  
3                0.0  
4                1.5  
5                0.0  
6                0.0  
7                0.0

It should work. It worked for me as you can see the edits. It is more readble too — wwnde, Mar 04 '22 at 21:37
Yours did work. I added the re-assignment. When I realized it was breaking mine I re-tried yours and it was good as well. — MarkS, Mar 04 '22 at 21:38

score 0 · Answer 2 · answered Mar 04 '22 at 21:35

Hmm...

I removed the re-assignment and it worked??

s0 = df3['amount_tsh'][df3['amount_tsh'].eq(0)].sample(37322).index df3.loc[df3.index.isin(s0), 'amount_tsh'] = np.nan

The second line was: df3['amount_tsh'] = df3.loc[df3.index.isin(s0), 'amount_tsh'] = np.nan

Change values in a column to np.nan based upon row index

2 Answers2