1

consider series x_1,x_2,x_3,x_4... I want to set x_i as NaN if x_i = x_{i+1}.... I don't care if x_2 equals, say, x_9. For a second or two, I had thought this was the meaning of duplicate values but I now see that it would care about x_9. I'm pretty sure this routine must already exist in pandas, but I can't find it.

def ff_repeated(xnp):
    nfnp = xnp.size
    ffnp = np.empty(nfnp,dtype=bool)
    ffnp[0] = False
    for i in range(1,nfnp):
        ffnp[i] =  xnp[i] == xnp[i-1] 
    return ffnp

Thoughts? How I use the above is then

ffnp = ff_repeated(dm.loc["Pressure"].values)
dm.loc["Pressure",ffnp] = np.NaN
Tunneller
  • 381
  • 2
  • 13

1 Answers1

1

Your version should work just fine, but it involves a for loop and therefore is inherently slow. You can make use of vectorization by simply shifting the pd.Series and comparing afterwards:

xnp = pd.Series([1,2,3,3,4,2,5,5,6])
ffnp = xnp.shift(1) == xnp

ffnp

0    False
1    False
2    False
3     True
4    False
5    False
6    False
7     True
8    False

You can then use ffnp to set the values to nan as you did

Lukas Thaler
  • 2,672
  • 5
  • 15
  • 31
  • Exactly what I needed. For an entertaining (?) twist. If you want to always force the last line to be False, dont do ``` ffnp[-1] = False ``` that creates a new index. Correct answer is ffnp.iloc[-1] = False. Gosh that took me an embarrasingly long time to debug... – Tunneller Feb 22 '20 at 23:55