I am trying to train a model based on an external cost function, the scheme is as follows:
- first the model_1 receives spectrograms and based on them produces an output vector of 4 values.
- Model_2 (which is already trained and I do not want their weights to be updated) then receives these 4 values and makes predictions based on them, which then using a time signal produces the output of the same shape as the input initially received by model_1 (the original spectrograms).
- Once this is done, the loss is calculated by comparing the final process output of point 2 and the input of model_1, which are the two spectrograms.
Once we have the loss produced and the predictions made by model_1, I would like to obtain its gradients to update the weights correctly. For this I used tf.GradientTape(), first making predictions of model_1 and then calculating the loss of the initial spectrograms with respect to those finally produced.
I understand that the problem is perhaps that since the final spectrograms have not been produced only using model_1, it is not possible to carry out this process in this way, so I wanted to know if it is possible to obtain the gradients only by means of some predictions and a scalar that represents the resulting value of the loss function.
Here is the function in which I try to get the gradients, however gradients is a list of None:
def train_modelo_inverso(self, epochs=40):
for s in range(epochs):
espectrogramas_predicts = self.get_espectrograms_from_amplitudes() # 1-We made predictions twice, this one to obtain predicted spectograms.
loss_modelo_inverso = self.loss_fn(self.amplitudes, espectrogramas_predicts)
# We make predictions by creating the tf graph to be able to make the partial derivative of the parameters with respect to the loss that occurs for the predictions with these parameters.
with tf.GradientTape() as tape:
modelo_inverso_preds = self.modelo_inverso(self.amplitudes) # 2-This is to update the parameters based on the predictions.
# espectrogramas_predicts = self.get_espectrograms_from_amplitudes()
loss_modelo_inverso = self.loss_fn(self.amplitudes, espectrogramas_predicts)
gradients = tape.gradient(loss_modelo_inverso, self.modelo_inverso.trainable_variables)
print(f"gradients {gradients} | len(self.modelo_inverso.trainable_variables) {len(self.modelo_inverso.trainable_variables)} | loss_modelo_inverso {loss_modelo_inverso}")# Obtenemos los gradientes en base a las predicciones
which print is:
gradients [None, None, None, None, None, None] | len(self.modelo_inverso.trainable_variables) 6 | loss_modelo_inverso (9.79201461332436e-12-1.57796630234136e-10j)
Which makes sense because we are not using the prediction tensor of model_1 to obtain the loss function, or in other words, we are not deriving with respect to a loss function.
I also leave a link to the drive directory in case someone wants to reproduce the results.: link to project on google drive