Triplet-Loss using pre-trained network

Question

I am trying to use the Triple-Loss technique to fine-tune an EfficientNet network for human Re-ID using Keras. Here is the code I am using:

This is the generator:

class SampleGen(object):
    def __init__(self, file_class_mapping):
        self.file_class_mapping = file_class_mapping
        self.class_to_list_files = defaultdict(list)
        self.list_all_files = list(file_class_mapping.keys())
        self.range_all_files = list(range(len(self.list_all_files)))

        for file, class_ in file_class_mapping.items():
            self.class_to_list_files[class_].append(file)

        self.list_classes = list(set(self.file_class_mapping.values()))
        self.range_list_classes = range(len(self.list_classes))
        self.class_weight = np.array([len(self.class_to_list_files[class_]) for class_ in self.list_classes])
        self.class_weight = self.class_weight / np.sum(self.class_weight)

    def get_sample(self):
        class_idx = np.random.choice(self.range_list_classes, 1, p=self.class_weight)[0]
        examples_class_idx = np.random.choice(range(len(self.class_to_list_files[self.list_classes[class_idx]])), 2)
        positive_example_1, positive_example_2 = \
            self.class_to_list_files[self.list_classes[class_idx]][examples_class_idx[0]], \
            self.class_to_list_files[self.list_classes[class_idx]][examples_class_idx[1]]

        negative_example = None
        while negative_example is None or self.file_class_mapping[negative_example] == \
                self.file_class_mapping[positive_example_1]:
            negative_example_idx = np.random.choice(self.range_all_files, 1)[0]
            negative_example = self.list_all_files[negative_example_idx]
        return positive_example_1, negative_example, positive_example_2


def read_and_resize(filepath):
    im = Image.open((filepath)).convert('RGB')
    im = im.resize((image_size, image_size))
    return np.array(im, dtype="float32")


def augment(im_array):
    if np.random.uniform(0, 1) > 0.9:
        im_array = np.fliplr(im_array)
    return im_array


def gen(triplet_gen):
    while True:
        list_positive_examples_1 = []
        list_negative_examples = []
        list_positive_examples_2 = []

        for i in range(batch_size):
            positive_example_1, negative_example, positive_example_2 = triplet_gen.get_sample()
            path_pos1 = join(path_train, positive_example_1)
            path_neg = join(path_train, negative_example)
            path_pos2 = join(path_train, positive_example_2)

            positive_example_1_img = read_and_resize(path_pos1)
            negative_example_img = read_and_resize(path_neg)
            positive_example_2_img = read_and_resize(path_pos2)

            positive_example_1_img = augment(positive_example_1_img)
            negative_example_img = augment(negative_example_img)
            positive_example_2_img = augment(positive_example_2_img)

            list_positive_examples_1.append(positive_example_1_img)
            list_negative_examples.append(negative_example_img)
            list_positive_examples_2.append(positive_example_2_img)

        A = preprocess_input(np.array(list_positive_examples_1))
        B = preprocess_input(np.array(list_positive_examples_2))
        C = preprocess_input(np.array(list_negative_examples))

        label = None

        yield {'anchor_input': A, 'positive_input': B, 'negative_input': C}, label

This is how I create the model:

def get_model():
    base_model = efn.EfficientNetB3(weights='imagenet', include_top=False)
    for layer in base_model.layers:
        layer.trainable = False

    x = base_model.output
    x = Dropout(0.6)(x)
    x = Dense(embedding_dim)(x)
    x = Lambda(lambda x: K.l2_normalize(x, axis=1), name="enc_out")(x)

    embedding_model = Model(base_model.input, x, name="embedding")

    input_shape = (image_size, image_size, 3)

    anchor_input = Input(input_shape, name='anchor_input')
    positive_input = Input(input_shape, name='positive_input')
    negative_input = Input(input_shape, name='negative_input')

    anchor_embedding = embedding_model(anchor_input)
    positive_embedding = embedding_model(positive_input)
    negative_embedding = embedding_model(negative_input)

    inputs = [anchor_input, positive_input, negative_input]
    outputs = [anchor_embedding, positive_embedding, negative_embedding]

    triplet_model = Model(inputs, outputs)
    triplet_model.add_loss(K.mean(triplet_loss(outputs)))

    return embedding_model, triplet_model

And this is how I'm trying to run the training:

if __name__ == '__main__':
    data = pd.read_csv(path_csv)
    train, test = train_test_split(data, train_size=0.7, random_state=1337)
    file_id_mapping_train = {k: v for k, v in zip(train.Image.values, train.Id.values)}
    file_id_mapping_test = {k: v for k, v in zip(test.Image.values, test.Id.values)}

    gen_tr = gen(SampleGen(file_id_mapping_train))
    gen_te = gen(SampleGen(file_id_mapping_test))

    embedding_model, triplet_model = get_model()

    for i, layer in enumerate(embedding_model.layers):
        print(i, layer.name, layer.trainable)

    for layer in embedding_model.layers[379:]:
        layer.trainable = True
    for layer in embedding_model.layers[:379]:
        layer.trainable = False

    triplet_model.compile(loss=None, optimizer=Adam(0.0001))
    history = triplet_model.fit(x=gen_tr,
                                validation_data=gen_te,
                                epochs=10,
                                verbose=1,
                                steps_per_epoch=200,
                                validation_steps=20,
                                callbacks=create_callbacks())

The csv contains two columns (Image, Id) and I am generating triplets on the go using a generator. The layer 379 is the last layer of the network so I just leave that as trainable. I let it run for some epochs and it seems like it doesn't converge, it stays around 2.30. On epochs like 20, the loss is even higher than what I've started with. Here you can see what I mean: train example Is there anything wrong with the way I think about the problem?

Thank you!

To debug this, we really need to make the model converge. One good technique is to first try to overfit a very small subset of the data. So for instance, keep 5 classes with 2 images per class and try to overfit that dataset. Make it as simple as possible to debug: remove dropout, put only the last dense layer as trainable, remove the L2 normalization. Remove things until you get convergence. Then you can add back elements and see what was the problem. Then you can scale back to the full dataset. — Olivier Moindrot, May 17 '20 at 21:08
@OlivierMoindrot Ok, I will try and do that and report back. How much should I set the batch_size in the case with 5 classes and 2 images/class? — Andrei-Eduard Lică, May 17 '20 at 22:19
https://pastebin.com/nBiYVBzw The changed part looks like this right now. The loss dropped, but it is jumping all around as you can see here: https://pastebin.com/s95gkW8P. What could cause this type of behavior? Something doesn't feel right. — Andrei-Eduard Lică, May 17 '20 at 22:41
Maybe you can visualize the embeddings of the 10 images and how they move around during training? — Olivier Moindrot, May 18 '20 at 07:16
Simplest way would be to have an embedding size of 2, then plot the embeddings at each iteration and observe what happens. — Olivier Moindrot, May 18 '20 at 10:10
That would imply using a callback? Cause I don't really know how could I check the embeddings at each training iteration — Andrei-Eduard Lică, May 18 '20 at 10:53
From what I've read, on TF 2.2.0 this options wouldn't work because the call back wouldn't take the results after each iteration. i've been struggling with it all day. — Andrei-Eduard Lică, May 18 '20 at 16:19
https://imgur.com/a/CcQ71iI -> If I run predict on those 10 images (5 classes - 2 sample/class) this is the output of my network with embedding size of 2. How could I plot this data efficiently so I can figure stuff out? — Andrei-Eduard Lică, May 18 '20 at 17:24
@OlivierMoindrot I've changed the base network to ResNet50 and here are my results on the full dataset: https://pastebin.com/KvW6vrci. At first, I left just the last last layer unfrozen and trained for 5 epochs. Apparently, the loss doesn't change at all here. Then, I unfroze all layers from layer 150 above, trained for 5 more epochs and then the loss started to decrease a lot. Maybe I could make it work with Efficient Net but I have to see what layers I need to unfreeze in order to get good results. Could you explain to me why I see this type of behavior? — Andrei-Eduard Lică, May 18 '20 at 23:41

Triplet-Loss using pre-trained network

0 Answers0