I have csv file with data like that:
jake 12 71 31 82 True
jake 44 54 44 80 True
jake 51 30 39 75 True
will 56 12 63 10 False
will 76 74 25 13 False
will 41 98 65 15 False
rich 77 11 93 25 False
rich 18 88 90 11 False
rich 22 12 99 20 False
chez 97 45 74 99 True
chez 91 31 71 15 True
chez 90 40 50 13 True
So it's multirow chunks of the data for each person.
I would like to read it for further processing with scikit-learn.
For now my code looks like this
import pandas as pd
import numpy as np
data = pd.read_csv('example_dataset.csv', sep=',')
data = data[['name', 'a', 'b', 'c', 'd', 'YesNo']]
X = np.array(data)
But I'm getting array that have each entry represents each row. But data have to build in the way that represents related data rows by name. So how to arrange that and prepare data for further use in machine learning to predict last column (is it most likely True
or False
)?