How do I create a combined iterator in MXNET? For example, given a record (.rec) iterator if I want to change the labels corresponding to each image then there are two options: a) Create a new rec iterator with the same data(images) and new labels. b) Create a multi-iterator using the original rec iterator and an NDArray iterator such that the multi-iterator reads data(images) from the original .rec iterator and labels from the NDArray iterator. The option (a) is tedious. Any suggestions on how to create such a multi-iterator?
Asked
Active
Viewed 391 times
1 Answers
4
class MultiIter(mx.io.DataIter):
def __init__(self, iter_list):
self.iters = iter_list
self.batch_size = 1000
def next(self):
batches = [i.next() for i in self.iters]
return mx.io.DataBatch(data=[t for t in batches[0].data]+ [t for t in batches[1].data], label= [t for t in batches[0].label] + [t for t in batches[1].label],pad=0)
def reset(self):
for i in self.iters:
i.reset()
@property
def provide_data(self):
return [t for t in self.iters[0].provide_data] + [t for t in self.iters[1].provide_data]
@property
def provide_label(self):
return [t for t in self.iters[0].provide_label] + [t for t in self.iters[1].provide_label]
train = MultiIter([train1,train2])
Where train1 and train2 can be any two DataIter. In particular, train1 can be a .rec iterator and train2 can be an NDArray iterator. The additional argument "pad=0" is required for calling predict method using the combined iterator if either of train1 or train2 is an NDArray iterator.
MultiIter returns a list of data and a list of labels combined from the two iterators. If you need only data from the first iterator and labels from the second iterator, the code below will work.
class MultiIter(mx.io.DataIter):
def __init__(self, iter_list):
self.iters = iter_list
self.batch_size = 1000
def next(self):
batches = [i.next() for i in self.iters]
return mx.io.DataBatch(data=[t for t in batches[0].data], label= [t for t in batches[1].label],pad=0)
def reset(self):
for i in self.iters:
i.reset()
@property
def provide_data(self):
return [t for t in self.iters[0].provide_data]
@property
def provide_label(self):
return [t for t in self.iters[1].provide_label]
train = MultiIter([train1,train2])

Ashish Khetan
- 121
- 1
- 6
-
It would be interesting if someone could write a multi-iterator where the two iterators could be shuffled in sync. As of now to use this Multi-iterator shuffle must be False in the two iterators. – Ashish Khetan Aug 22 '17 at 01:56
-
The way I do this is with scikit-learn's sklearn.util.shuffle(*args) function. But if you don't want that dependency, if all your iterators are sliceable you can make an indexer which is a random permutation of the index array and then slice all your sub-iters with the same array. – Ben Allison Aug 22 '17 at 19:46