Questions tagged [tf.data.dataset]

145 questions
2
votes
1 answer

extracting numpy value from tensorflow object during transformation

i am trying to get word embeddings using tensorflow, and i have created adjacent work lists using my corpus. Number of unique words in my vocab are 8000 and number of adjacent word lists are around 1.6 million Word Lists sample photo Since the data…
1
vote
0 answers

ValueError: The truth value of an array with more than one element is ambiguous.Use a.any() or a.all(). While using dataset_from_directory

Want to convert images in directory to tensors in tf.dataset.Dataset format, so => tf.keras.utils.image_dataset_from_directory: Generates a tf.data.Dataset from image files in a directory labels: Either "inferred" (labels are generated from the…
user19676560
1
vote
1 answer

tf.data.Dataset.from_generator takes only first 256 elements

I am using the from_generator function in tf.data.Dataset to load my data of 9000 samples, but it takes only the first 256 elements and repeats them to fill 9000 samples. def gen(): for idx in z: yield idx z = list(range(9000)) # 9000 is the…
calk231
  • 13
  • 2
1
vote
0 answers

What is the most efficient way of creating a tf.dataset from multiple json.gz files with multiple text records?

I have thousands of json.gz files, each with a variety of information about scientific papers. For each file, I have to extract the relevant information - e.g. title and labels - to make a dataset, then transform it to a tf.dataset. However, it is…
Marlon Teixeira
  • 334
  • 1
  • 14
1
vote
0 answers

CNN performance worse when loading data with tf.Data

I have a trained EfficientNetB2 neural network that I'm using for image classification. When I'm loading the images with PIL like this: image = Image.open(item) image = image.convert('RGB').resize((120, 120)) image = np.array(image) if image.ndim…
1
vote
1 answer

Convert list of tuples to tensorflow dataset (tf.data.Dataset)

Data from kaggle Natural Language Processing with Disaster Tweets ds_train >>>[("Already expecting to be inundated w/ articles about trad authors' pay plummeting by early next year but if this is true it'll be far worse", 0) ('@blazerfan not…
user19676560
1
vote
1 answer

Normalisation layer for tf.data.Dataset

I am trying to improve the Tensorflow tutorial on Time series forecasting. The code is quite long, but my doubt regards only a small part of it. In the tutorial the data is normalized is the usual way: it is demeaned and standardized using the mean…
NC520
  • 346
  • 3
  • 13
1
vote
0 answers

Iterate through tensorflow dataset without exceeding RAM

I am trying to train a model in Python using Tensorflow on Google Colab Pro+ (51Gb of available RAM). This model needs to be trained with a bunch of HD images (9000 1440x720 images) . In order to train such model, I prepare my data with a…
1
vote
2 answers

tf.data.Dataset.zip: Can we have some alternative method of tf.data.Dataset.zip?

When utilizing tf.data. Dataset.zip for zipping two datasets. It combines each index value of the first dataset with the corresponding index value of the second datasets. a = tf.data.Dataset.range(1, 4) # ==> [ 1, 2, 3 ] b =…
1
vote
1 answer

How to change the values of a tf.Dataset object in a specific index

The structure of my tf.data.Dataset object is as follow. ((3, 400, 1), (3, 400, 1)) I would like to divide the elements in the 3rd row, of each element by 10. My code is as follows. But it complains as NumPy arrays are immutable (I'd like to use map…
noone
  • 6,168
  • 2
  • 42
  • 51
1
vote
1 answer

Unable to batch dataset using `.batch` and `.padded_batch`

I'm writing some variable length string feature to tfrecord. If the feature has the same shape for all examples, it runs perfectly fine without problems. If the shape varies, the error below is raised whenever the created tfrecord is being…
user12690225
1
vote
0 answers

How to do k-Fold Cross Validation with tf.data.Dataset API?

It easy to do k-Fold Cross Validation by using scikit-learn package. There we separately use data and labels. However, here combine data and labels before feeding into model. like: tf.data.Dataset.from_tensor_slices((X, Y)) …
Ahmad
  • 645
  • 2
  • 6
  • 21
1
vote
0 answers

Tensorflow Keras: Problems to handle variable length input, using generator?

We want to train our model on varying input dimensions. Every input in a given batch and across batches has different dimensions. We cannot resize our input (since we’ll lose our microscopic features). Now, since we cannot resize our input,…
Ahmad
  • 645
  • 2
  • 6
  • 21
1
vote
1 answer

Python Tensorflow itertools groupby: using itertools.groupby() in tf.data.Dataset.filter()

I am trying to apply a filter to a tf.data.Dataset which removes any strings where one group > 50% of the string. Here is my Dataset: import tensorflow as tf strings = [ ["ABCDEFGABCDEFG\tUseless\tLabel1"], …
1
vote
1 answer

Python, TensorFlow, Keras: tf.data.Dataset apply tokenizer to only one axis, drop axis

I am trying to build a tf.data.Dataset pipeline that reads 16 tab separated .gzip files which include a sentence, a useless file indicator, and a label. I'd like to apply a tokenizer to the first axis of the dataset only. Additionally, I'd love to…
1 2
3
9 10