I have a dataframe with several hundered rows and columns and want to drop all NaNs. Unfortunatly there are NaNs in all columns and all rows.
df = df.dropna(how = "any")
would therefore result in an empty dataframe. I use a while loop to iteratively dropnan columns with a threshhold.
i = 0
while df.isna().sum().sum() != 0:
i += 0.01
df= df.dropna(thresh=(i * df.shape[0]), axis= 0)
df= df.dropna(thresh=(i * df.shape[1]), axis= 1)
This greedy algoritm is in more than one way for sure a sub optimal solution.
Aside from writing my own linear program to minimize deleted data, is there maybe a build in fuction that I do not know?
My goal is to preserve as much data as possible.