3

As an example, I'd like to train a neural network to predict the location of a picture(longitude, latitude) with the image, temperature, humidity and time of year as inputs into the model.

My question is, what is the best way to add this addition information to a cnn? Should I just merge the numeric inputs with the cnn in the last dense layer or at the beginning? Should I encode the numeric values (temperature, humidity and time of year)?

Any information, resources, sources would be greatly appreciated, thanks in advance.

user3029296
  • 113
  • 1
  • 2
  • 8

2 Answers2

4

You can process numeric inputs separately and merge them afterwards before making the final prediction:

# Your usual CNN whatever it may be
img_in = Input(shape=(width, height, channels))
img_features = SomeCNN(...)(img_in)

# Your usual MLP model
aux_in = Input(shape=(3,))
aux_features = Dense(24, activation='relu')(aux_in)

# Possibly add more hidden layers, then merge
merged = concatenate([img_features, aux_features])

# create last layer.
out = Dense(num_locations, activation='softmax')(merged)

# build model
model = Model([img_in, aux_in], out)
model.compile(loss='categorical_crossentropy', ...)

Essentially, you treat them as separate inputs and learn useful features that combined allow your model to predict. How you encode numeric inputs really depends on their type.

For continuous inputs like temperature you can normalize between -1, 1 for discrete inputs one-hot is very often. Here is a quick guide.

Innat
  • 16,113
  • 6
  • 53
  • 101
nuric
  • 11,027
  • 3
  • 27
  • 42
  • Thanks a lot, this will be very helpful. One more question, are simple Dense layers the best for numeric data or does it make sense to use more complex layers found in cnns(pooling, batch, conv layer, etc)? – user3029296 May 27 '18 at 22:49
  • @nuric, Thanks a lot. Explained pretty well. – sahaj patel Aug 29 '19 at 15:26
  • @nuric How do you keep the consistency between an image and its corresponding numeric inputs while training? – bit_scientist Feb 11 '20 at 01:53
0

If you want to predict basis on those four features then i would suggest go with cnn + rnn

so feed the image to cnn and take the logits after that make a sequence like

logits=np.array(output).flatten()

[[logits] , [temperature], [humidity] , [time_of_year]] and feed it to

rnn , Rnn will treat it like a sequence input.

Aaditya Ura
  • 12,007
  • 7
  • 50
  • 88