How to combine multiple datasets into one dataset?

Question

Suppose I have 3 tfrecord files, namely neg.tfrecord, pos1.tfrecord, pos2.tfrecord.

I use

dataset = tf.data.TFRecordDataset(tfrecord_file)

this code creates 3 Dataset objects.

My batch size is 400, including 200 neg data, 100 pos1 data, and 100 pos2 data. How can I get the desired dataset?

I will use this dataset object in keras.fit() (Eager Execution).

My tensorflow's version is 1.13.1.

Before, I tried to get the iterator for each dataset, and then manually concat after getting the data, but it was inefficient and the GPU utilization was not high.

score 1 · Answer 1 · answered Mar 14 '19 at 06:05

1

You can use interleave

filenames = [tfrecord_file1, tfrecord_file2]
dataset = (Dataset.from_tensor_slices(filenames).interleave(lambda x:TFRecordDataset(x)
dataset = dataset.map(parse_fn)
...

Or you can even try parallel interleave. See https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset#interleave https://www.tensorflow.org/api_docs/python/tf/data/experimental/parallel_interleave

answered Mar 14 '19 at 06:05

Sharky

4,473
2
19
27

yes, it work! But I found that interleave can only read 3 tfrecord files on average. If my current batch size is 400, I need to take 200 samples from neg, 100 samples for pos1, and 100 samples for pos2. How can i do? – Gary Mar 14 '19 at 06:46
You can do this in your parse function. Or maybe flat_map will suite you https://www.tensorflow.org/api_docs/python/tf/data/Dataset#flat_map Or maybe it's better to create another question with more specifics and include code that you've tried – Sharky Mar 14 '19 at 10:29

score -1 · Answer 2 · answered Mar 05 '22 at 23:26

-1

This worked for a project I am currently executing on Kaggle.
I read in 5 datasets for different years and used the following code to merge them.
Blessings- john-Eric

frames=[df,df1,df2,df3,df4]

data = pd.concat(frames)

data

answered Mar 05 '22 at 23:26

J.E.Bonilla

1
1

2

Hi, the original poster is not asking combining multiple `pandas.dataframe`. – Simon Mar 06 '22 at 05:31

How to combine multiple datasets into one dataset?

2 Answers2

Linked