Questions tagged [tf.data.dataset]
145 questions
2
votes
0 answers
Creating training data with generator and using tf.data from_generator
I'm new to the tf.data API and I'm trying to write a code to do the following:
I have to train a NN with A LOT of training data (more than my RAM can handle anyway). I don't have a dataset, but I have to generate the data myself starting from…

r_song
- 21
- 1
2
votes
1 answer
How to deal with images and masks using tf.dataset in a semantic segmentation task?
My data does not fit into memory so I need to use the equavalent to flow_from_directory of the ImageDataGenerator class, but that supports tensorflow_datasets. I found image_dataset_from_directory, a utility function of keras that generates a…

Ahmed
- 111
- 5
2
votes
0 answers
Adding randomness to the performing of image augmentation while using tf.data.dataset.from_tensor_slices with tf.cond
Good day to all!
I am trying to add the possibility to control the "randomness" of the augmentation, which is applied to the image data. This means that I perform every augmentation operation with a certain probability (for example 0.1). Since I use…

Denis D.
- 21
- 1
2
votes
1 answer
Why is my model giving poor accuracy when the data is loaded using tf.data?
I am new to the tf.data API and trying to use it to load images from disk in the Dogs vs. Cats Redux: Kernels Edition Kaggle competition. To do this, I first created a pandas DataFrame named train_df with two columns - file_path containing the…

Gulshan Mishra
- 85
- 1
- 8
2
votes
0 answers
How to add a nested dictionary as an input to tf.data.Dataset.from_tensor_slices
I am trying to load a dataset using the tf.data.Dataset.from_tensor_slices command.
My input is a list of nested dictionaries in the following format:
a_dict = { 'a' : 'blablabla',
'b' : {
'c': (tf.constant([[0.390,…

aDav
- 41
- 8
2
votes
0 answers
Character level tokenization with special tokens
I am feeding my discord server messages into an RNN, so that i can create a chatbot based on those messages. I know tensorflow's tf.keras.preprocessing.text.Tokenizer can tokenize on a character level, but I wanted to include special tokens, since I…

Elysium
- 339
- 3
- 10
2
votes
1 answer
Tensorflow DataSet Shuffle Impact the validation training accuracy and ambiguous behavior
i am struggling with training a neural network that uses tf.data.DataSet as input.
What I find is that if I call .shuffle() before split the entire dataset in train, val, test set the accuracy on val (in training) and test (in evaluate) is 91%, but…

Nixiam
- 43
- 7
2
votes
3 answers
Invalid argument: Dimension -972891 must be >= 0
I have created a data pipeline using tf.data for speech recognition using the following code snippets:
def get_waveform_and_label(file_path):
label = tf.strings.split(file_path, os.path.sep)[-2]
audio_binary = tf.io.read_file(file_path)
…

Soroush
- 83
- 8
2
votes
1 answer
How to efficiently feed data into TensorFlow 2.x,
I am looking at a data preprocessing task on a large amount of text data and want to load the preprocessed data into TensorFlow 2.x. The preprocessed data contains arrays of integer values since the preprocessing step generates:
a one hot encoded…

user8276908
- 1,051
- 8
- 20
2
votes
2 answers
Difference between tf.data.Datasets.repeat(EPOCHS) vs model.fit epochs=EPOCHS
While training, I set epochs to number of times to iterate over the data. I was wondering what is the use of tf.data.Datasets.repeat(EPOCHS) when I can already do the same thing with model.fit(train_dataset,epochs=EPOCHS)?

spb
- 165
- 1
- 9
2
votes
2 answers
one hot encode labels of tf.data.Dataset
I am trying to convert the labels of a tf.data.Dataset to one hot encoded labels. I am using this dataset. I've added titles (sentiment, text) to the columns, everything else is original.
Here is the code I use to encode the labels (positive,…

Johann Süß
- 97
- 1
- 12
2
votes
0 answers
Using tensorflow.data to generate dataset of images and multiple labels
I am trying to train a neural network to draw a bounding box around an object. I have generated the data myself, 256x256 rgb images and five labels per image (two corners of bounding box + a rotational component). In order to not run out of memory…

do_not_understand
- 21
- 3
2
votes
1 answer
Cannot batch tensors with different shapes in component 0
InvalidArgumentError: Cannot batch tensors with different shapes in component 0. First element had shape [224,224,3] and element 25 had shape [224,224,1].
I have already reshaped images as you can seen here.
def…

Sid Menon
- 33
- 7
2
votes
1 answer
tf.data WindowDataset flat_map gives 'dict' object has no attribute 'batch' error
I am trying to do batches of type (batch_size, time_steps, my_data)
Why on flat_map step I get AttributeError: 'dict' object has no attribute 'batch'
x_train = np.random.normal(size=(60000, 768))
token_type_ids = np.ones(shape=(len(x_train)))
…

Night Walker
- 20,638
- 52
- 151
- 228
2
votes
1 answer
Does kedro support tfrecord?
To train tensorflow keras models on AI Platform using Docker containers, we convert our raw images stored on GCS to a tfrecord dataset using tf.data.Dataset. Thereby the data is never stored locally. Instead the raw images are transformed directly…

evolved
- 1,850
- 19
- 40