1

So I have a little problem. I have a .csv matrix that I want to transform in a numpy array so i found this: np.genfromtxt('/Users/username/Documents/fichieretudebis.csv', delimiter= ';')

The matter is that my .csv matrix contains number and string, and I need both of them to appear in my array ( but I want them to keep their type) I tried to transform the matrix in a str matrix (with dtype=str) but I can't transform the number back in a float type. Does someone know how to do it ? Thx

More explanation :

My .csv file is like thisenter image description here

I need to use this file in order to create a tree ( using sklearn and Random forest algorithms)

This is what I currently wrote enter image description here

( file called ResultatBis and Previsionbis have the same problem ).

I don't know how to create a array that's going to be recognize by sklearn without using the numpylibrary but I need my matrix to stay exactly the same.

Tell me if that's enough explanation and thx for your future help !

  • numpy is for homogeneous, aligned, data. for more exotic schemes, have a look at pandas. – B. M. Mar 02 '16 at 19:06

2 Answers2

2

do

np.genfromtxt('/Users/username/Documents/fichieretudebis.csv', delimiter= ';',dtype=None)

(after https://stackoverflow.com/a/15481761/1461850)

Community
  • 1
  • 1
Lee
  • 29,398
  • 28
  • 117
  • 170
  • 1
    Thx ! It's going to help me but how do you get ride of a b in front of all string elements. [(44, 75007, 0, 0, 0, b'gmail') (31, 75018, 13, 1, 0, b'gmail') (25, 75001, 11, 1, 1, b'gmail') (11, 75019, 4, 1, 0, b'gmail')] This is the type of output I get – Lucas Fischer Mar 02 '16 at 19:58
  • The `b` is just Python3's way of indicating that it read byte (ASCII) strings from your file. Py3's default string type is `unicode`. Look at the `dtype`. For this field it probably is ` – hpaulj Mar 02 '16 at 20:44
0

You can also try using Pandas:

import pandas as pd
prediction = pd.read_csv('/Users/username/Documents/fichieretudebis.csv', delimiter= ';')

Pandas is very popular for reading and manipulating data from .csv datasets. In my machine learning assignments I've always used it.

Fernando Wittmann
  • 1,991
  • 20
  • 16