0

For below I am using a Haberman's Dataset & LINK FOR DATASET - https://www.kaggle.com/gilsousa/habermans-survival-data-set/version/1

df_1 = df.loc[df["survival_status"] == "1"]; #here ,I have put this "1" is from dataset,1 means survive ,means 1 is a dependent variable
df_2 = df.loc[df["survival_status"] == "2"]; #here ,I have put this "2" is from dataset,2 means not survive, means 2 is a dependent variable
counts,bin_edges=np.histogram(df_1["age"],bins=10,density=None)
pdf=counts/(sum(counts))
print(pdf)
print(bin_edges)
cdf=np.cumsum(pdf)
plt.plot(bin_edges[1:],pdf)
plt.plot(bin_edges[1:],cdf)
plt.show()

I am getting below error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-147-3367cd025c24> in <module>
      1 df_1 = df.loc[df["survival_status"] == "1"];
      2 df_2 = df.loc[df["survival_status"] == "2"];
----> 3 counts,bin_edges=np.histogram(df_1["age"],bins=10,density=None)
      4 pdf=counts/(sum(counts))
      5 print(pdf)

<__array_function__ internals> in histogram(*args, **kwargs)

2 frames
/usr/local/lib/python3.7/dist-packages/numpy/lib/histograms.py in histogram(a, bins, range, normed, weights, density)
    791     a, weights = _ravel_and_check_weights(a, weights)
    792 
--> 793     bin_edges, uniform_bins = _get_bin_edges(a, bins, range, weights)
    794 
    795     # Histogram is an integer or a float array depending on the weights.

/usr/local/lib/python3.7/dist-packages/numpy/lib/histograms.py in _get_bin_edges(a, bins, range, weights)
    424             raise ValueError('`bins` must be positive, when an integer')
    425 
--> 426         first_edge, last_edge = _get_outer_edges(a, range)
    427 
    428     elif np.ndim(bins) == 1:

/usr/local/lib/python3.7/dist-packages/numpy/lib/histograms.py in _get_outer_edges(a, range)
    320     else:
    321         first_edge, last_edge = a.min(), a.max()
--> 322         if not (np.isfinite(first_edge) and np.isfinite(last_edge)):
    323             raise V`enter code here`alueError(
    324                 "autodetected range of [{}, {}] is not finite".format(first_edge, last_edge))

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I am trying to create a PDF & CDF and I am getting this above TypeError.

1 Answers1

0

You might have column 'age' imported with a non-numeric type. Look this up: Python Numpy TypeError: ufunc 'isfinite' not supported for the input types

I executed the same code of yours prepended with

df = pd.read_csv("haberman.csv", names = ['age', 'Op_Year', 'axil_nodes', 'survival_status']

and didn't get your error.

rkj
  • 329
  • 1
  • 3
  • 11