1

I would like to use pandas to prepare an array from a CSV file to then use in keras.

Here is an example of load_data() that I have, which works:

import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Here is my attempt to load my own array using read_csv() (1530 rows with 42 columns, first col string, rest numbers (some int, most float)):

import pandas as pd
training_data = pd.read_csv('fName.csv',index_col='ID')

Both snippes work fine, but read_csv() does not let me access the content. While the first version with load_data() allows to access the actual array content, the array training_data that I read with read_csv() gives the error below:

>>> x_train[1,10,10:20]
array([238, 252, 252, 179,  12,  75, 121,  21,   0,   0], dtype=uint8)
>>> training_data[1,10:20]
TypeError: '(1, slice(10, 20, None))' is an invalid key

Both arrays seam to read properly - so what am I not doing right?

>>> np.ndim(x_train)
3
>>> np.ndim(training_data)
2

What can I do?

KingOtto
  • 840
  • 5
  • 18
  • `pd.read_csv` returns a dataframe. Read `pandas` docs to learn how to access its data. A dataframe is organized as a table, with rows and columns. Columns are accessed by name, and may contain differing types of data (`dtype`). `to_numpy` can be used to generate a numpy array from the frame. It might help if you displayed `data.info()` to get an overview of what the dataframe is lilke. – hpaulj Dec 06 '20 at 20:07

0 Answers0