2

How do I create a combined iterator in MXNET? For example, given a record (.rec) iterator if I want to change the labels corresponding to each image then there are two options: a) Create a new rec iterator with the same data(images) and new labels. b) Create a multi-iterator using the original rec iterator and an NDArray iterator such that the multi-iterator reads data(images) from the original .rec iterator and labels from the NDArray iterator. The option (a) is tedious. Any suggestions on how to create such a multi-iterator?

Ashish Khetan
  • 121
  • 1
  • 6

1 Answers1

4
class MultiIter(mx.io.DataIter):  
    def __init__(self, iter_list):  
        self.iters = iter_list   
        self.batch_size = 1000  
    def next(self):  
        batches = [i.next() for i in self.iters]  
        return mx.io.DataBatch(data=[t for t in batches[0].data]+ [t for t in batches[1].data], label= [t for t in batches[0].label] + [t for t in batches[1].label],pad=0)  
    def reset(self):  
        for i in self.iters:  
            i.reset()  
    @property  
    def provide_data(self):  
        return [t for t in self.iters[0].provide_data] + [t for t in self.iters[1].provide_data] 
    @property  
    def provide_label(self):  
        return [t for t in self.iters[0].provide_label] + [t for t in self.iters[1].provide_label]

train = MultiIter([train1,train2])

Where train1 and train2 can be any two DataIter. In particular, train1 can be a .rec iterator and train2 can be an NDArray iterator. The additional argument "pad=0" is required for calling predict method using the combined iterator if either of train1 or train2 is an NDArray iterator.

MultiIter returns a list of data and a list of labels combined from the two iterators. If you need only data from the first iterator and labels from the second iterator, the code below will work.

class MultiIter(mx.io.DataIter):  
    def __init__(self, iter_list):  
        self.iters = iter_list   
        self.batch_size = 1000  
    def next(self):  
        batches = [i.next() for i in self.iters]  
        return mx.io.DataBatch(data=[t for t in batches[0].data], label= [t for t in batches[1].label],pad=0)  
    def reset(self):  
        for i in self.iters:  
            i.reset()  
    @property  
    def provide_data(self):  
        return [t for t in self.iters[0].provide_data] 
    @property  
    def provide_label(self):  
        return [t for t in self.iters[1].provide_label] 

train = MultiIter([train1,train2])
Ashish Khetan
  • 121
  • 1
  • 6
  • It would be interesting if someone could write a multi-iterator where the two iterators could be shuffled in sync. As of now to use this Multi-iterator shuffle must be False in the two iterators. – Ashish Khetan Aug 22 '17 at 01:56
  • The way I do this is with scikit-learn's sklearn.util.shuffle(*args) function. But if you don't want that dependency, if all your iterators are sliceable you can make an indexer which is a random permutation of the index array and then slice all your sub-iters with the same array. – Ben Allison Aug 22 '17 at 19:46