I have a set of data stored in csv file, currently I read it out and store it np, and then transfer it into Dataset use below code
def read_data():
with open(fname, "r") as f:
lines = f.read().split("\n")
header = lines[0].replace('"', "").split(',')
lines = lines[1:]
print(header)
print(len(lines))
float_data = np.zeros((len(lines), len(header) - 1))
for i, line in enumerate(lines):
values = [float(x) for x in line.split(",")[1:]]
float_data[i, :] = values
return tf.data.Dataset.from_tensor_slices(float_data)
and then I want to define a generator function to get data from this dataset for train, but it looks that Dataset is not subscriptable, like numpy I can use [:2] to get the data from it, but Dataset cannot.
How can I do it?
below is my generator function when I use numpy as input(the first parameter is numpy)
def generator(data, lookback, delay, min_index, max_index, shuffle = False, batch_size = 128, step = 6):
if max_index is None:
max_index = len(data) - delay - 1
i = min_index + lookback
while True:
if shuffle:
rows = np.random.randint(min_index + lookback, max_index, size = batch_size)
else:
if i + batch_size >= max_index:
i = min_index + lookback
rows = np.arange(i, min(i + batch_size, max_index))
i += len(rows)
samples = np.zeros((len(rows),
lookback // step,
data.shape[-1]))
targets = np.zeros((len(rows),))
for j, row in enumerate(rows):
indices = range(rows[j] - lookback, rows[j], step)
samples[j] = data[indices]
targets[j] = data[rows[j] + delay][1]
yield samples, targets
I'm not sure if the Dataset can do the same thing like what I did when use numpy
I can use tf.data.Dataset.from_tensor_slices
method in the end of this generator, but it was low performance even I use from_generator(generator).prefetch()
method, I assume that it was because the data is very big, performance were limited by CPU to process the numpy data(I referenced to this question Tensorflow: How to prefetch data on the GPU from CPU tf.data.Dataset (from_generator)), so I want to load data as Tensor start from begining to see if this will speed up my code or not.
Thanks!