0

Is there a standard way to input multiple images into a CNN and condense the information down into a singular image in the end? I'm working on an HDR dataset where you have multiple images of the same scene and put them together to form an image with less noise.

What I have tried to do is set the separate images as channels, but I'm unsure if this is appropriate as the outputs were weird.

simonjoh
  • 13
  • 2

1 Answers1

0
  1. Is this implementable?

Yes! Any neural network, regardless of number or type of layers, is nothing more than some function f: IO, so as long as your different inputs belong to the same type you can freely pass as many of them through your network as you please, getting equally many outputs on the other side of the black box. These can be in turn used in any way you desire through any other function g: O x OO' (could be addition, tensor multiplication, concatenation or something fancier like another neural network).

  1. Does this make sense?

No answer here; this largely depends on what you are expecting your function to be doing, what your data are and what your end-goal is. Are you assuming that all of your inputs obey the same statistical properties or have similar underlying patterns? In that case, you could argue that the same function can model them. Keep in mind that by applying them in parallel (i.e. asynchronously) you're keeping your function blind to potential cross-dependencies between the two inputs, which in some cases makes complete sense. To make this a bit clearer, if your different inputs are the different channels of a single image, I think this is the wrong approach; each channel may convey different information, and keeping your network unaware of how each channel affects each other, while forcing it to create meaningful abstractions using the same function over all of them doesn't sound like a great idea. On the other hand, if your different images are e.g. photos of an object from different angles and the sub-network you're applying them to is some sort of classifier over their features (already obtained through another CNN, for instance), then it could make sense to model both using the same function.

KonstantinosKokos
  • 3,369
  • 1
  • 11
  • 21
  • In my case, the input images would be a burst of images depicting the same scene each with 3 channels, RGB. As you suggest, one of the worries I have is that simply applying them in parallel will ignore cross-dependencies (since they are in a sense the exact same image, only slightly shifted and with different noise). Is there possibly a standardized way to handle multiple similar copies of the same input to a network? – simonjoh Mar 20 '18 at 09:00
  • But is your goal to model dependencies between series of similar images, or to remove noise from a single image? – KonstantinosKokos Mar 20 '18 at 09:07
  • The end goal would be to remove noise from a single image, i.e. the end output would be a clearer image with less noise. – simonjoh Mar 20 '18 at 09:12
  • Then obviously it would make no sense for your network to learn representations over multiple images simultaneously- what I think you need here is a neural net that accepts a single image and outputs another single image, wrapped in a functional API that accepts multiple images; for each set of similar images, store the intermediate representations and apply some loss function penalizing the difference between them. This should force your network to learn similar representations for the images in question, modeling their actual features rather than the noise that differs amongst them. – KonstantinosKokos Mar 20 '18 at 09:21
  • What you're saying makes sense, I will try doing something like that. Thanks for the help! – simonjoh Mar 20 '18 at 10:04