Lets say I have the following dataframe:
df1 = pd.DataFrame(data = [1,np.nan,np.nan,1,1,np.nan,1,1,1],
columns = ['X'],
index = ['a', 'a', 'a',
'b', 'b', 'b',
'c', 'c', 'c'])
print(df1)
X
a 1.0
a NaN
a NaN
b 1.0
b 1.0
b NaN
c 1.0
c 1.0
c 1.0
I want to keep only the indices which have 2 or more non-NaN entries. In this case, the 'a' entries only have one non-NaN value, so I want to drop it and have my result be:
X
b 1.0
b 1.0
b NaN
c 1.0
c 1.0
c 1.0
What is the best way to do this? Ideally I want something that works with Dask too, although usually if it works with Pandas it also works in Dask.