1

Someone please give me a VAEX alternative for this code:

df_train = vaex.open('../input/ms-malware-hdf5/train.csv.hdf5')
total = df_train.isnull().sum().sort_values(ascending = False) 
desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

3

Vaex does not at this time support counting missing values on a dataframe level, only on an expression (column) level. So you will have to do a bit of work yourself.

Consider the following example:

import vaex
import vaex.ml
import pandas as pd

df = vaex.ml.datasets.load_titanic()

count_na = []  # to count the missing value per column
for col in df.column_names:
    count_na.append(df[col].isna().sum().item())

s = pd.Series(data=count_na, index=df.column_names).sort_values(ascending=True)

If you think this is something you might need to use often, it might be worth it to create your own dataframe method following this example.

Joco
  • 803
  • 4
  • 7