I'm working on a project using Keras which has a large amount of input data, and a smaller amount of output/label data (which are images). The mapping of input->output data is contiguous and consistent, i.e. the first 1000 input samples correspond to the first image, the second 1000 input samples correspond to the second image and so forth.
Since the output data are images, having thousands of unnecessary copies of the same image in a numpy array is off the table as it would require an enormous amount of memory. I was looking for a way of having "soft" links in the numpy array, such that indexing simply maps to a smaller array, however I could not find an acceptable way of doing this.
EDIT: I should add a bit more info here as I probably didn't explain the situation properly above.
The project I'm working on takes a video, splits the audio and video, uses the audio for input and uses the individual frames from the video as output. At the "rawest" form, the net will have a single input (one audio sample) and some set of convolution layers to form the outputs.
Of course, the number of input points available (say 48,000 samples per second for 48kHz audio) greatly shadows the number of output points (~24 fps). The immediate simple option (and the option I'd take if my output data was of smaller form) would be to just replicate the data in the array and pony up to the extra RAM usage. Unfortunately this is not an option as it would require increasing the array by about 2000 times, which for an already large dataset, would generate an OOM pretty fast.
Hopefully that's a better explanation of the situation that I'm in. So far, one of the options I've considered/attempted is to overload some functions in the numpy array class, such as getitem, with the intention of just mapping indices to a smaller array. I abandoned this because I'm sure the backend of Keras just takes a contiguous block from numpy and uses that. Another option I've considered is to work with much smaller batches, and to just replicate the images as much as possible, train, and move onto the next set of images. This is messy though (and feels like quitting).
I think the best option, and one that I'll try next, is to use ldavid's suggestion of Keras' TimeDistributed function. If I understand it correctly, I can use it to "batch" the input samples down into a set of samples with the same size of the output data.