Questions tagged [tf.data.dataset]

145 questions
2
votes
0 answers

Creating training data with generator and using tf.data from_generator

I'm new to the tf.data API and I'm trying to write a code to do the following: I have to train a NN with A LOT of training data (more than my RAM can handle anyway). I don't have a dataset, but I have to generate the data myself starting from…
2
votes
1 answer

How to deal with images and masks using tf.dataset in a semantic segmentation task?

My data does not fit into memory so I need to use the equavalent to flow_from_directory of the ImageDataGenerator class, but that supports tensorflow_datasets. I found image_dataset_from_directory, a utility function of keras that generates a…
2
votes
0 answers

Adding randomness to the performing of image augmentation while using tf.data.dataset.from_tensor_slices with tf.cond

Good day to all! I am trying to add the possibility to control the "randomness" of the augmentation, which is applied to the image data. This means that I perform every augmentation operation with a certain probability (for example 0.1). Since I use…
Denis D.
  • 21
  • 1
2
votes
1 answer

Why is my model giving poor accuracy when the data is loaded using tf.data?

I am new to the tf.data API and trying to use it to load images from disk in the Dogs vs. Cats Redux: Kernels Edition Kaggle competition. To do this, I first created a pandas DataFrame named train_df with two columns - file_path containing the…
2
votes
0 answers

How to add a nested dictionary as an input to tf.data.Dataset.from_tensor_slices

I am trying to load a dataset using the tf.data.Dataset.from_tensor_slices command. My input is a list of nested dictionaries in the following format: a_dict = { 'a' : 'blablabla', 'b' : { 'c': (tf.constant([[0.390,…
aDav
  • 41
  • 8
2
votes
0 answers

Character level tokenization with special tokens

I am feeding my discord server messages into an RNN, so that i can create a chatbot based on those messages. I know tensorflow's tf.keras.preprocessing.text.Tokenizer can tokenize on a character level, but I wanted to include special tokens, since I…
2
votes
1 answer

Tensorflow DataSet Shuffle Impact the validation training accuracy and ambiguous behavior

i am struggling with training a neural network that uses tf.data.DataSet as input. What I find is that if I call .shuffle() before split the entire dataset in train, val, test set the accuracy on val (in training) and test (in evaluate) is 91%, but…
2
votes
3 answers

Invalid argument: Dimension -972891 must be >= 0

I have created a data pipeline using tf.data for speech recognition using the following code snippets: def get_waveform_and_label(file_path): label = tf.strings.split(file_path, os.path.sep)[-2] audio_binary = tf.io.read_file(file_path) …
2
votes
1 answer

How to efficiently feed data into TensorFlow 2.x,

I am looking at a data preprocessing task on a large amount of text data and want to load the preprocessed data into TensorFlow 2.x. The preprocessed data contains arrays of integer values since the preprocessing step generates: a one hot encoded…
user8276908
  • 1,051
  • 8
  • 20
2
votes
2 answers

Difference between tf.data.Datasets.repeat(EPOCHS) vs model.fit epochs=EPOCHS

While training, I set epochs to number of times to iterate over the data. I was wondering what is the use of tf.data.Datasets.repeat(EPOCHS) when I can already do the same thing with model.fit(train_dataset,epochs=EPOCHS)?
2
votes
2 answers

one hot encode labels of tf.data.Dataset

I am trying to convert the labels of a tf.data.Dataset to one hot encoded labels. I am using this dataset. I've added titles (sentiment, text) to the columns, everything else is original. Here is the code I use to encode the labels (positive,…
Johann Süß
  • 97
  • 1
  • 12
2
votes
0 answers

Using tensorflow.data to generate dataset of images and multiple labels

I am trying to train a neural network to draw a bounding box around an object. I have generated the data myself, 256x256 rgb images and five labels per image (two corners of bounding box + a rotational component). In order to not run out of memory…
2
votes
1 answer

Cannot batch tensors with different shapes in component 0

InvalidArgumentError: Cannot batch tensors with different shapes in component 0. First element had shape [224,224,3] and element 25 had shape [224,224,1]. I have already reshaped images as you can seen here. def…
Sid Menon
  • 33
  • 7
2
votes
1 answer

tf.data WindowDataset flat_map gives 'dict' object has no attribute 'batch' error

I am trying to do batches of type (batch_size, time_steps, my_data) Why on flat_map step I get AttributeError: 'dict' object has no attribute 'batch' x_train = np.random.normal(size=(60000, 768)) token_type_ids = np.ones(shape=(len(x_train))) …
Night Walker
  • 20,638
  • 52
  • 151
  • 228
2
votes
1 answer

Does kedro support tfrecord?

To train tensorflow keras models on AI Platform using Docker containers, we convert our raw images stored on GCS to a tfrecord dataset using tf.data.Dataset. Thereby the data is never stored locally. Instead the raw images are transformed directly…
1
2
3
9 10