I am doing my Master's thesis in Multimodal Emotion Recognition, more precisely, I want to apply knowledge distillation from a multimodal teacher model to an unimodal student model. I am using Keras Distiller()
class (https://keras.io/examples/vision/knowledge_distillation/) and following the instructions presented in the final of the page. However, my teacher model receives as input the resulting tensor of the concatenation between text features (text_shape = (1038, 33, 600)
) and audio features (audio_shape = (1038, 33, 600)
), resulting in the final_shape = (1038, 33, 1200)
while my student model receives as input a tensor of shape (1038, 33, 100) and my label shape is label_shape= (1038, 33, 3)
. As you can see, my inputs are different on my two models and the Distiller()
class is predefined to work with the same input for both models and that is what I am trying to change.
The first thing I tried to change in the keras class was to pass in the beggining of def train_step
from this:
def train_step(self, data):
# Unpack data
x, y = data
to this:
def train_step(self, data):
# Unpack data
x, y = data
x_teacher = x[0]
x_student = x[1]
and then change the variable x in the rest of the code for x_teacher
in the parts about the teacher model and for x_student
when it was about student model. Then, I run the following code to train my student model:
text_distilled = Distiller(student=text_student, teacher=bimodal_teacher)
text_distilled.compile(
optimizer=keras.optimizers.Adam(),
metrics=[keras.metrics.SparseCategoricalAccuracy()],
student_loss_fn=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
distillation_loss_fn=keras.losses.KLDivergence(),
alpha=0.1,
temperature=10,
)
both_inputs = [x_teacher, x_student]
text_distilled.fit(both_inputs, y_train, epochs=3)
and I got this error:
ValueError: Can not squeeze dim[2], expected a dimension of 1, got 3 for '{{node Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[-1]](IteratorGetNext:2)' with input shapes: [?,33,3].
After that, I decided to resort to Chatgpt to what it told me to change the beggining of def train_Step
to this:
def train_step(self, data):
# Unpack data
x_teacher, x_student, y = data
Then, I called the fit() function:
text_distilled = Distiller(student=text_student, teacher=bimodal_teacher)
text_distilled.compile(
optimizer=keras.optimizers.Adam(),
metrics=[keras.metrics.SparseCategoricalAccuracy()],
student_loss_fn=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
distillation_loss_fn=keras.losses.KLDivergence(),
alpha=0.1,
temperature=10,
)
text_distilled.fit((x_teacher, x_student, y_train), epochs=3)
But I got this error:
ValueError: not enough values to unpack (expected 3, got 1)
I don't know what to try anymore neither how to solve this errors for both situations. Any help?