use genfromtxt() to load the file and then check the number of missing values in each column using the numpy isnan() function with sum()

Question

I am trying to load this file using genfromtxt and count missing values in each column

below is my code:

import numpy as np

data = np.genfromtxt(datafile, delimiter=",", names=["col1","col2","col3","col4","col5","col6"], dtype=None, encoding='ascii')

missing_values = np.isnan(data)

but it gives me below error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-120-a4d778701252> in <module>
----> 1 missing_values = np.isnan(data)

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Look at `data.shape` and `data.dtype`. It's important to understand what your `genfromtxt` has produced. With a `dtype=None`, the result is a `structured array`. That is documented, but it takes time to understand it. And as the error says, `isnan` cannot work on that kind of dtype. — hpaulj, Sep 26 '22 at 04:14

score 0 · Answer 1 · answered Sep 26 '22 at 00:49

The problem is that np.isnan works only with float; but when you use names in genfromtxt, you complicate the dtype:

If names is a sequence or a single-string of comma-separated names, the names will be used to define the field names in a structured dtype. If names is None, the names of the dtype fields will be used, if any.

The easiest way is to use genfromtxt without column names (as dtype float), and then count the nan values:

data = np.genfromtxt(datafile, delimiter=",", encoding='ascii')
np.count_nonzero(np.isnan(data))

thanks. i tried using above but its taking all the values as null — ashish_goy, Sep 26 '22 at 16:55

use genfromtxt() to load the file and then check the number of missing values in each column using the numpy isnan() function with sum()

1 Answers1