Multi-Label Image Classification

Question

I tried myself but couldn't reach the final point that's why posting here, please guide me.

I am working in multi-label image classification and have slightly different scenarios. Actually I am confused, how we will map labels and their attribute with Id etc So we can use for training and testing.

Here is code on which I am working

import os
import numpy as np
import pandas as pd
from keras.utils import to_categorical
from collections import Counter
from keras.callbacks import Callback
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from matplotlib import pyplot
from tensorflow.keras import backend

def create_tag_mapping(mapping_csv):
    labels = set()
    for i in range(len(mapping_csv)):
        tags = mapping_csv['Labels'][i].split(' ')
        labels.update(tags)
    labels = list(labels)
    labels.sort()
    labels_map = {labels[i]:i for i in range(len(labels))}
    inv_labels_map = {i:labels[i] for i in range(len(labels))}
    return labels_map, inv_labels_map

# create a mapping of filename to tags
def create_file_mapping(mapping_csv):
    mapping = dict()
    for i in range(len(mapping_csv)):
        name, tags = mapping_csv['Id'][i], mapping_csv['Labels'][i]
        mapping[name] = tags.split(' ')
    return mapping

# create a one hot encoding for one list of tags
def one_hot_encode(tags, mapping):
    # create empty vector
    encoding = np.zeros(len(mapping), dtype='uint8')
    # mark 1 for each tag in the vector
    for tag in tags:
        encoding[mapping[tag]] = 1
    return encoding

def load_dataset(path, file_mapping, tag_mapping):
    photos, targets = list(), list()
    # enumerate files in the directory
    for filename in os.listdir(path):
        # load image
        photo = load_img(path + filename, target_size=(760,415))
        # convert to numpy array
        photo = img_to_array(photo, dtype='uint8')
        # get tags
        tags = file_mapping[filename[:-4]]
        # one hot encode tags
        target = one_hot_encode(tags, tag_mapping)
        # store
        photos.append(photo)
        targets.append(target)
    X = np.asarray(photos, dtype='uint8')
    y = np.asarray(targets, dtype='uint8')
    return X, y

trainingLabels = 'labels.csv'
# load the mapping file
mapping_csv = pd.read_csv(trainingLabels)


# create a mapping of tags to integers
tag_mapping, _ = create_tag_mapping(mapping_csv)

# create a mapping of filenames to tag lists
file_mapping = create_file_mapping(mapping_csv)


# load the png images
folder = 'dataset/'

X, y = load_dataset(folder, file_mapping, tag_mapping)
print(X.shape, y.shape)

trainX, testX, trainY, testY = train_test_split(X, y, test_size=0.3, random_state=1)
print(trainX.shape, trainY.shape, testX.shape, testY.shape)

img_x,img_y=760,415
trainX=trainX.reshape(trainX.shape[0], img_x,img_y,3)
testX=testX.reshape(testX.shape[0], img_x,img_y,3)

trainX=trainX.astype('float32')
testX=testX.astype('float32')

trainX /= 255
testX /=255

trainY=to_categorical(trainY,3)
testY=to_categorical(testY,3)
print(trainX.shape)
print(trainY.shape)

model = Sequential()
model.add(Conv2D(32, (5, 5), strides=(1,1), activation='relu', input_shape=(img_x, img_y,3)))
model.add(MaxPooling2D((2, 2), strides=(2,2)))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(3, activation='sigmoid'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history=model.fit(trainX, trainY, batch_size=2, epochs=5, verbose=1)
plt.plot(history.history['acc'])
plt.plot(history.history['loss'])
plt.title('Accuracy and loss')
plt.xlabel('epoch')
plt.ylabel('accuracy/loss')
plt.legend(['Accuracy','loss'],loc='upper left')
plt.show()

score=model.evaluate(testX,testY,verbose=0)
print('test loss',score[0])
print('test accuracy',score[1])

I have attached an image file, that will give a clear picture of my problem.

Because If we followed these

etc. They have multi labels against each image but in my case, I have multilabel plus their attributes.

Jindřich · Answer 1 · 2019-11-13T14:48:05.230

1

If your goal is to predict if 'L', 'M' and 'H', you are using an incorrect loss function. You should use binary_crossentropy. The shape of your targets will be batch × 3 in this case.

categorical_crossentropy assumes the output is a categorical distribution: a vector of values that sum up to one. In other words, you have multiple possibilities, but only of them can be the correct one.
binary_crossentropy assumes that every number from the output vector is a (conditionally) independent binary distribution, so each number is between 0 and 1, but they do not necessarily sum up to one, because it can very well happen that all of them true.

If your goal is to predict for each label1, ..., label6 the value, then you should model a categorical distribution for each of the labels. You have six labels, each of them has 3 values, you thus need 18 numbers (logits). The shape of your targets will be batch × 6 × 3 in this case.

model.add(Dense(18, activation='none'))

Because you don't want a single distribution over 18 values, but over 6 × 3 values, you need to reshape the logits first:

model.add(Reshape((6, 3))
model.add(Softmax())

edited Nov 13 '19 at 14:48

answered Nov 12 '19 at 10:52

Jindřich

10,270
2
23
44

#Jindřich thank you for answer How we can perform mapping, like before training we map Id with labels. So we can feed them for training. Like you mentioned categorical distribution for each of the labels – Magi Nov 12 '19 at 11:15
I would do it in numpy as you load the data. For each row, you can either do a vector of shape (6,) and add indices like L=0, M=1, H=2 and then use `sparse_categorical_crossentropy`; or do np.zeros((6,3)) and then manually assign ones to respective places and use `categorical_crossentropy`. – Jindřich Nov 12 '19 at 11:54
Thank you Can you please share a link regarding that I didn't get your point 'you can either do a vector of shape (6,) and add indices like L=0, M=1, H=2 ' actually I am a beginner in ML, DL, and python that's why need more explanation will very thankful – Magi Nov 12 '19 at 12:33
I meant: your six labels, e.g., H L M L H H M=> [2, 0, 1, 2, 2, 1] in the sparse case, but [[0,0,1], [1,0,0], [0,1,0], [0,0,1], [0,0,1], [0,1,0]] in the non-sparse case. – Jindřich Nov 13 '19 at 11:53
Ok got it Please also let me know where I will place it before the final layer? "Because you don't want a single distribution over 6 values, but over 6 × 3 values, you need to reshape the logits first: model.add(Reshape((6, 3)) model.add(Softmax())" – Magi Nov 13 '19 at 14:41
Oh sorry, I meant 18 values. Put the three layers from my snippet instead of your original sigmoid layer. – Jindřich Nov 13 '19 at 14:49
Jindrich, please guide m here I have encoded my labels in such way H L M L H H M=> [[0,0,1], [1,0,0], [0,1,0], [0,0,1], [0,0,1], [0,1,0]] and used your above mentioned code snippet in this case our loss function will be 'categorical_crossentropy'? am I right? if not then please guide me – Magi Dec 10 '19 at 04:33
Sorry, I mean 'binary_crossentropy' – Magi Dec 10 '19 at 04:43

score 0 · Accepted Answer · answered Nov 21 '19 at 04:27

Base on the above discussion. Here is the solution for the above problem. As I mentioned we have a total of 5 labels and each label have further three tags like (L, M, H) We can perform encoding in this way

# create a one hot encoding for one list of tags
def custom_encode(tags, mapping):
    # create empty vector
    encoding=[]
    for tag in tags:
        if tag == 'L':
            encoding.append([1,0,0])
        elif tag == 'M':
            encoding.append([0,1,0])
        else:
            encoding.append([0,0,1])
    return encoding

So encoded y-vector will look like

**Labels     Tags             Encoded Tags** 
Label1 ----> [L,L,L,M,H] ---> [ [1,0,0], [1,0,0], [1,0,0], [0,1,0], [0,0,1] ]
Label2 ----> [L,H,L,M,H] ---> [ [1,0,0], [0,0,1], [1,0,0], [0,1,0], [0,0,1] ]
Label3 ----> [L,M,L,M,H] ---> [ [1,0,0], [0,1,0], [1,0,0], [0,1,0], [0,0,1] ]
Label4 ----> [M,M,L,M,H] ---> [ [0,1,0], [0,1,0], [1,0,0], [0,1,0], [0,0,1] ]
Label5 ----> [M,L,L,M,H] ---> [ [0,1,0], [1,0,0], [1,0,0], [0,1,0], [0,0,1] ]

The final layer will be like

 model.add(Dense(15)) #because we have total 5 labels and each has 3 tags so 15 neurons will be on final layer
 model.add(Reshape((5,3))) # each 5 have further 3 tags we need to reshape it
 model.add(Activation('softmax'))

Multi-Label Image Classification

2 Answers2

Linked