0

I want to get the covariance from the iris data set, https://www.kaggle.com/jchen2186/machine-learning-with-iris-dataset/data

I am using numpy, and the function -> np.cov(iris)

with open("Iris.csv") as iris:
    reader = csv.reader(iris)
    data = []
    next(reader)
    for row in reader:
        data.append(row)

for i in data:
    i.pop(0)
    i.pop(4)

iris = np.array(data)
np.cov(iris)

And I get this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-bfb836354075> in <module>
----> 1 np.cov(iris)

D:\Anaconda\lib\site-packages\numpy\lib\function_base.py in cov(m, y, rowvar, bias, ddof, fweights, aweights)
   2300             w *= aweights
   2301 
-> 2302     avg, w_sum = average(X, axis=1, weights=w, returned=True)
   2303     w_sum = w_sum[0]
   2304 

D:\Anaconda\lib\site-packages\numpy\lib\function_base.py in average(a, axis, weights, returned)
    354 
    355     if weights is None:
--> 356         avg = a.mean(axis)
    357         scl = avg.dtype.type(a.size/avg.size)
    358     else:

D:\Anaconda\lib\site-packages\numpy\core\_methods.py in _mean(a, axis, dtype, out, keepdims)
     73             is_float16_result = True
     74 
---> 75     ret = umr_sum(arr, axis, dtype, out, keepdims)
     76     if isinstance(ret, mu.ndarray):
     77         ret = um.true_divide(

TypeError: cannot perform reduce with flexible type

I don't understand what it means..

Mar
  • 3
  • 2

1 Answers1

0

So, if you want to modify your code you could try by reading the Iris.csv with pandas.read_csv function. And then select the appropiate columns of your choice.

BUT, here is a little set of commands to ease up this task. They use scikit-learn and numpy to load the iris dataset obtain X and y and obtain covariance matrix:

from sklearn.datasets import load_iris
import numpy as np

data = load_iris()
X = data['data']
y = data['target']

np.cov(X)

Hope this has helped.

BCJuan
  • 805
  • 8
  • 17
  • It woked! Thank you @BCJuan even though,, I don't understan, the type(X) is numpy.ndarray and type(iris) is also numpy.ndarray .. Why it doesn't work with iris dataset? – Mar Apr 04 '19 at 07:31
  • Which Iris dataset? Your csv? Or the `data` that I have written. Data is a dictionary which includes X, y and info. And regarding your `Iris.csv`. Ido not know that is inside. But if its tabular and it has both X and y you can use `pandas.read_csv`. Nevertheless, without knowing what is inside your `Iris.csv` I cannot help – BCJuan Apr 04 '19 at 07:36
  • 1
    yes, I was refering to my `Iris.csv`, and yes it's tabular and it doesn't has y, but X .. it supposed I just have to use numpy. Anyway, thank you !! – Mar Apr 04 '19 at 07:45