1

My goal is to tune over possible network architectures that meet the following criteria:

  1. Layer 1 can have any number of hidden units from this list: [32, 64, 128, 256, 512]

Then, the number of hidden units to be explored for the rest of the layers should always depend on the particular selection that was made in the layer above it, specifically:

  1. Layer 2 can have the same or half as many units as layer 1.
  2. Layer 3 can have the same or half as many units as layer 2.
  3. Layer 4 can have the same or half as many units as layer 3.

As I am currently implementing it, the hp.Choice options for layers 2, 3 and 4 are never updating once they have been established for the first time.

For example, pretend on the first pass of the tuner num_layers = 4 which means all four layers will get created. If, for example, layer 1 selects 256 hidden units, the options become:

Layer 2 --> [128, 256]

Layer 3 --> [64, 128]

Layer 4 --> [32, 64]

Layers 2, 3 and 4 stay stuck with these choices for every iteration that follows, rather than updating to adapt to future selections for layer 1.

This means in future iterations when the number of hidden units in layer 1 changes, the options for layers 2, 3 and 4 no longer meet the intended goal of exploring options where each subsequent layer can either contain the same or half as many hidden units as the previous layer.

def build_and_tune_model(hp, train_ds, normalize_features, ohe_features, max_tokens, passthrough_features):
    
    all_inputs, encoded_features = get_all_preprocessing_layers(train_ds,
                                                            normalize_features=normalize_features,
                                                            ohe_features=ohe_features,
                                                            max_tokens=max_tokens,
                                                            passthrough=passthrough_features)

    
    
    # Possible values for the number of hidden units in layer 1.
    # Defining here because we will always have at least 1 layer.
    layer_1_hidden_units = hp.Choice('layer1_hidden_units', values=[32, 64, 128, 256, 512])

    # Possible number of layers to include
    num_layers = hp.Choice('num_layers', values=[1, 2, 3, 4])
    
    print("================= starting new round =====================")
    print(f"Layer 1 hidden units = {hp.get('layer1_hidden_units')}")
    print(f"Num layers is {hp.get('num_layers')}")
    
    
    all_features = layers.concatenate(encoded_features)
    
    x = layers.Dense(layer_1_hidden_units,
                     activation="relu")(all_features)

    
    if hp.get('num_layers') >= 2:
        
        with hp.conditional_scope("num_layers", [2, 3, 4]):
            
            # Layer 2 hidden units can either be half the layer 1 hidden units or the same.
            layer_2_hidden_units = hp.Choice('layer2_hidden_units', values=[(int(hp.get('layer1_hidden_units') / 2)),
                                                                            hp.get('layer1_hidden_units')])

            
            print("\n==========================================================")
            print(f"In layer 2")
            print(f"num_layers param = {hp.get('num_layers')}")
            print(f"layer_1_hidden_units = {hp.get('layer1_hidden_units')}")
            print(f"layer_2_hidden_units = {hp.get('layer2_hidden_units')}")
            print("==============================================================\n")

            x = layers.Dense(layer_2_hidden_units,
                             activation="relu")(x)

    if hp.get('num_layers') >= 3:
        
        with hp.conditional_scope("num_layers", [3, 4]):
        
            # Layer 3 hidden units can either be half the layer 2 hidden units or the same.
            layer_3_hidden_units = hp.Choice('layer3_hidden_units', values=[(int(hp.get('layer2_hidden_units') / 2)),
                                                                            hp.get('layer2_hidden_units')])


            print("\n==========================================================")
            print(f"In layer 3")
            print(f"num_layers param = {hp.get('num_layers')}")
            print(f"layer_1_hidden_units = {hp.get('layer1_hidden_units')}")
            print(f"layer_2_hidden_units = {hp.get('layer2_hidden_units')}")
            print(f"layer_3_hidden_units = {hp.get('layer3_hidden_units')}")
            print("==============================================================\n")

            x = layers.Dense(layer_3_hidden_units,
                             activation="relu")(x)

    if hp.get('num_layers') >= 4:
        
        with hp.conditional_scope("num_layers", [4]):
        
            # Layer 4 hidden units can either be half the layer 3 hidden units or the same.
            # Extra stipulation applied here, layer 4 hidden units can never be less than 8.
            layer_4_hidden_units = hp.Choice('layer4_hidden_units', values=[max(int(hp.get('layer3_hidden_units') / 2), 8),
                                                                            hp.get('layer3_hidden_units')])


            print("\n==========================================================")
            print(f"In layer 4")
            print(f"num_layers param = {hp.get('num_layers')}")
            print(f"layer_1_hidden_units = {hp.get('layer1_hidden_units')}")
            print(f"layer_2_hidden_units = {hp.get('layer2_hidden_units')}")
            print(f"layer_3_hidden_units = {hp.get('layer3_hidden_units')}")
            print(f"layer_4_hidden_units = {hp.get('layer4_hidden_units')}")
            print("==============================================================\n")

            x = layers.Dense(layer_4_hidden_units,
                             activation="relu")(x)

    
    output = layers.Dense(1, activation='sigmoid')(x)
    
    model = tf.keras.Model(all_inputs, output)
    
    model.compile(optimizer=tf.keras.optimizers.Adam(),
                  metrics = ['accuracy'],
                  loss='binary_crossentropy')
    
    print(">>>>>>>>>>>>>>>>>>>>>>>>>>>> End of round <<<<<<<<<<<<<<<<<<<<<<<<<<<<<")
    
    return model

Does anyone know the correct way to tell Keras Tuner to explore all possible options for each layers hidden units, where the area to explore satisfies the criteria that each layer after the first is allowed to have the same or half as many hidden units as the previous layer, and the first layer can have a number hidden units from the list [32, 64, 128, 256, 512]?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Braden Anderson
  • 141
  • 1
  • 11

1 Answers1

3

During the runtime of an application, it is not possible to update a generated hyperparameter and its associated choice options. Let's consider an example to illustrate this:

Trial 1:

  • first_layer_units: [32, 64, 128, 256, 512]

Suppose the value 256 is randomly selected as the unit count for first_layer_units. Then, based on this selection:

  • first_hidden_layer_units: [128, 256]

Suppose 128 is chosen for first_hidden_layer_units. Subsequently:

  • second_hidden_layer_units: [64, 128]

Suppose 64 is selected for second_hidden_layer_units. Finally:

  • third_hidden_layer_units: [32, 64]

Now, let's move on to Trial 2:

Trial 2:

  • first_layer_units: [32, 64, 128, 256, 512]

Suppose this time the value 64 is randomly selected as the unit count for first_layer_units. Ideally, we would expect the choices for the hidden layer hyperparameters to be updated accordingly:

  • first_hidden_layer_units: [32, 64]

However, the issue arises when using Keras Tuner, as it does not update the choices for the hidden layer hyperparameters based on the new value of first_layer_units. Instead, it retains the choices from Trial 1. Additionally, the hyperparameters second_hidden_layer_units and third_hidden_layer_units remain active, even though they were generated in Trial 1 and are not applicable to Trial 2.

To address the first problem, we need to generate separate sets of hyperparameters for each scenario. This can be accomplished by dynamically generating the hyperparameter names based on the total_layer_count and previous_layer_index:

current_layer_index = previous_layer_index - 1
hidden_units = hp.Choice(f'hidden_units_layer_{total_layer_count}_{current_layer_index}', values=[(int(hp.get(f'hidden_units_layer_{total_layer_count}_{previous_layer_index}') / 2)), hp.get(f'hidden_units_layer_{total_layer_count}_{previous_layer_index}')])

By generating distinct hyperparameters for each unique scenario, we ensure that the hyperparameters are appropriately updated for each scenario.

To solve the second problem and deactivate hyperparameters created for other scenarios, we can establish a parent-child relationship using a conditional scope. This ensures that the child hyperparameter is activated only if the parent hyperparameter is active. By doing so, we can disable all the hyperparameters generated for other scenarios. The conditional scope can be implemented as follows:

with hp.conditional_scope(parent_hp_name, parent_hp_value):
   hidden_units = hp.Choice(child_hp_name, values=child_hp_value)

With this approach, the child hyperparameter will be active only when the parent hyperparameter satisfies the specified condition.

In summary, the final code snippet to address both problems can be structured as follows:

# List possible units
possible_units = [32, 64, 128, 256, 512]

possible_layer_units = []
for index, item in enumerate(possible_units[:-1]):
    possible_layer_units.append([item, possible_units[index + 1]])

# possible_layer_units = [[32, 64], [64, 128], [128, 256], [256, 512]] 

# Add first layer
all_features = layers.concatenate(encoded_features) 
first_layer_units = hp.Choice('first_layer_units', values=possible_units)
x = layers.Dense(first_layer_units, activation="relu")(all_features)

# Get the number of hidden layers based on first layer unit count
hidden_layer_count = possible_units.index(first_layer_units)

if 0 < hidden_layer_count:
    iter_count = 0
    for hidden_layer_index in range(hidden_layer_count - 1, -1, -1):
        if iter_count == 0:
            # Collect HP 'units' details for the second layer
            parent_hp_name = 'first_layer_units'
            parent_hp_value = possible_layer_units[hidden_layer_index]
            child_hp_name = 'units_layer_' + str(hidden_layer_count) + str(hidden_layer_index)
            child_hp_value = parent_hp_value
        else:
            # Collect HP 'units' details for the next layers
            parent_hp_name = 'units_layer_' + str(hidden_layer_count) + str(hidden_layer_index + 1)
            parent_hp_value = possible_layer_units[hidden_layer_index + 1]
            child_hp_name = 'units_layer_' + str(hidden_layer_count) + str(hidden_layer_index)
            child_hp_value = possible_layer_units[hidden_layer_index]

        # Add and Activate child HP under parent HP using conditional scope
        with hp.conditional_scope(parent_hp_name, parent_hp_value):
            hidden_units = hp.Choice(child_hp_name, values=child_hp_value)
            
        # Add remaining NN layers one by one
        x = layers.Dense(hidden_units, activation="relu")(x)

        iter_count += 1

By dynamically generating the hyperparameters based on the previous layer's unit count and utilizing the conditional scope to control their activation, we can effectively address both problems.

Sachin Savale
  • 91
  • 1
  • 8
  • Hey Sachin, thanks for taking the time to write this detailed response. Is there any chance you could please add some commenting to your code? I am trying to reproduce your instructions and am having a hard time following along. Also, should the range(hidden_layer_count-1, -1) loop be a count down loop to zero (i.e. range(hidden_layer_count-1, 0, -1) instead?). If you have a minimal working example that I can reproduce I would be happy to review it and then accept the answer. Thanks again! – Braden Anderson Jan 11 '22 at 05:29
  • 1
    Hi Braden, I have fixed the loop count down issue and an issue which was being triggered while getting the unit count value, also added layer addition code to get more clarity, code is now working properly(not tested layer addition part) – Sachin Savale Jan 11 '22 at 21:08
  • The KerasTuner API here has a good example of using the conditional_scope context: https://keras.io/api/keras_tuner/hyperparameters/ – brethvoice May 31 '22 at 15:49