Read csv like multidimensional data array for further processing with sklearn

Question

I have csv file with data like that:

jake 12 71 31 82 True
jake 44 54 44 80 True
jake 51 30 39 75 True
will 56 12 63 10 False
will 76 74 25 13 False
will 41 98 65 15 False
rich 77 11 93 25 False
rich 18 88 90 11 False
rich 22 12 99 20 False
chez 97 45 74 99 True
chez 91 31 71 15 True
chez 90 40 50 13 True

So it's multirow chunks of the data for each person.

I would like to read it for further processing with scikit-learn.

For now my code looks like this

import pandas as pd
import numpy as np

data = pd.read_csv('example_dataset.csv', sep=',')
data = data[['name', 'a', 'b', 'c', 'd', 'YesNo']]
X = np.array(data)

But I'm getting array that have each entry represents each row. But data have to build in the way that represents related data rows by name. So how to arrange that and prepare data for further use in machine learning to predict last column (is it most likely True or False)?

@anky_91 I'm fairy new to machine learning. And to honest don't clearly know what is the correct way to represent the data for processing. All I know for sure it's that I have 3 rows for each person name and all data for each person are related. But I as far as I logically understand, I cannot set them in a single sequensional data row. Because each of 4 digits are taken from different timeframe of the day. — Quanti Monati, Nov 02 '19 at 18:01
after having 3 rows per person, you may join each group of 3 rows into one row 12 columns if that is what you want — Andy L., Nov 02 '19 at 19:04
@AndyL. Thanks, Andy, but what should I do in this case, when I the first number in each row received at the exact time, but different day. So, I have 3 data lines (rows) that kinda related. What would you suggest? — Quanti Monati, Nov 02 '19 at 19:34

score 2 · Accepted Answer · answered Nov 02 '19 at 18:00

2

The following lines allow me to appropriate pull in the table into a proper df.

data = pd.read_csv("example_dataset.csv", header=None, sep=",")
data.columns = ["name", "a", "b", "c", "d", "YesNo"]
print(data.head())

answered Nov 02 '19 at 18:00

Brandon

126
6

As you can see, I have related data in each 3 rows. For correct analysis I have to have dataframe that represents data correlation, I guess – Quanti Monati Nov 02 '19 at 18:03

Read csv like multidimensional data array for further processing with sklearn

1 Answers1