1

I trained my network several times and I already got some results. Then I found out about the Keras tuner and wanted to find the best hyperparameters with it. but the loss in tuner always becomes nan ( it won't get nan if you train it regularly). I'm using MobileNetv3Small as the backbone and wanted to find optimal layers numbers and units. here is my model build:

def build_model(hp):
model = keras.Sequential()
model.add(base)
# Tune the number of layers.
if hp.Boolean('globalMax'):
  model.add(layers.GlobalMaxPool2D())
model.add(layers.Flatten())
for i in range(hp.Int("num_layers", 1, 3)):
    model.add(
        layers.Dense(
            # Tune number of units separately.
            units=hp.Int(f"units_{i}", min_value=3, max_value=12, step=1),
        )
    )
if hp.Boolean("dropout"):
    model.add(layers.Dropout(rate=0.1))
model.add(layers.Dense(3))
model.compile(loss=mae, optimizer='sgd',metrics=[mae])
return model

and I'm using

 `tuner = kt.RandomSearch(
    hypermodel=build_model,
    objective="val_loss",
    executions_per_trial=2,
    overwrite=True
)`

and this is the output: Best val_loss So Far: nan Total elapsed time: 00h 02m 28s INFO:tensorflow:Oracle triggered exit

what is the problem? I already checked any other optimizer ( however it works with .fit perfectly), tried removing dropout and even normalization

parsa
  • 370
  • 4
  • 13
  • Have you tried to change all layers to have dtype float64 ? Try this if not: `tf.keras.backend.set_floatx('float64')` – saleh sargolzaee Apr 05 '22 at 07:59
  • @parsa if u maintain only the Flatten layer ? (or only the GlobalMaxPool2D) – Marco Cerliani Apr 05 '22 at 13:12
  • @salehsargolzaee I'm pretty sure it has nothing to float type but just in case, I've tried and it didn't work. Thanks for your suggestion – parsa Apr 06 '22 at 13:33
  • @MarcoCerliani I used Boolean for globalmaxpool2d so it tests when it's only flatten layer. – parsa Apr 06 '22 at 13:36
  • @parsa u should test with Flatten and GlobalMaxPool2D both active – Marco Cerliani Apr 06 '22 at 13:39
  • I tried with both active @MarcoCerliani as I mentioned before – parsa Apr 06 '22 at 22:19
  • 1
    Have you tried to supply an activation (e.g; relu, prelu, etc.) to the `Dense` layers being created in the layer loop? The linear activation may be causing some instability. Also, what is the architecture of the model when you just call the `fit` method (which you mentioned had stable results)? – danielcahall Apr 12 '22 at 01:54
  • @danielcahall the architect is a globallmax, flatten 2 dense layers. I also tried multiple activitaions earlier and didn't work – parsa Apr 12 '22 at 09:03

1 Answers1

0

So I finally found the problem. It happened because keras_tuner is just trying to find some validation with a small batch and in my situation, it will be nan because the number is nearly infinite. after trying a bigger batch, and changing the loss function, it could get out of being Nan all the time and found some results.

parsa
  • 370
  • 4
  • 13