0

I am trying to load this file using genfromtxt and count missing values in each column

enter image description here

below is my code:

import numpy as np

data = np.genfromtxt(datafile, delimiter=",", names=["col1","col2","col3","col4","col5","col6"], dtype=None, encoding='ascii')

missing_values = np.isnan(data)

but it gives me below error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-120-a4d778701252> in <module>
----> 1 missing_values = np.isnan(data)

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
hpaulj
  • 221,503
  • 14
  • 230
  • 353
ashish_goy
  • 31
  • 5
  • Look at `data.shape` and `data.dtype`. It's important to understand what your `genfromtxt` has produced. With a `dtype=None`, the result is a `structured array`. That is documented, but it takes time to understand it. And as the error says, `isnan` cannot work on that kind of dtype. – hpaulj Sep 26 '22 at 04:14

1 Answers1

0

The problem is that np.isnan works only with float; but when you use names in genfromtxt, you complicate the dtype:

If names is a sequence or a single-string of comma-separated names, the names will be used to define the field names in a structured dtype. If names is None, the names of the dtype fields will be used, if any.

The easiest way is to use genfromtxt without column names (as dtype float), and then count the nan values:

data = np.genfromtxt(datafile, delimiter=",", encoding='ascii')
np.count_nonzero(np.isnan(data))
AndrzejO
  • 1,502
  • 1
  • 9
  • 12