0

I am trying to train a model based on an external cost function, the scheme is as follows:

enter image description here

  1. first the model_1 receives spectrograms and based on them produces an output vector of 4 values.
  2. Model_2 (which is already trained and I do not want their weights to be updated) then receives these 4 values and makes predictions based on them, which then using a time signal produces the output of the same shape as the input initially received by model_1 (the original spectrograms).
  3. Once this is done, the loss is calculated by comparing the final process output of point 2 and the input of model_1, which are the two spectrograms.

Once we have the loss produced and the predictions made by model_1, I would like to obtain its gradients to update the weights correctly. For this I used tf.GradientTape(), first making predictions of model_1 and then calculating the loss of the initial spectrograms with respect to those finally produced.

I understand that the problem is perhaps that since the final spectrograms have not been produced only using model_1, it is not possible to carry out this process in this way, so I wanted to know if it is possible to obtain the gradients only by means of some predictions and a scalar that represents the resulting value of the loss function.

Here is the function in which I try to get the gradients, however gradients is a list of None:

  def train_modelo_inverso(self, epochs=40):
    for s in range(epochs):
      espectrogramas_predicts = self.get_espectrograms_from_amplitudes() # 1-We made predictions twice, this one to obtain predicted spectograms.
      loss_modelo_inverso     = self.loss_fn(self.amplitudes, espectrogramas_predicts)

      # We make predictions by creating the tf graph to be able to make the partial derivative of the parameters with respect to the loss that occurs for the predictions with these parameters.
      with tf.GradientTape() as tape:
        modelo_inverso_preds = self.modelo_inverso(self.amplitudes)                    # 2-This is to update the parameters based on the predictions.
        # espectrogramas_predicts = self.get_espectrograms_from_amplitudes()
        loss_modelo_inverso  = self.loss_fn(self.amplitudes, espectrogramas_predicts)
      
      gradients = tape.gradient(loss_modelo_inverso, self.modelo_inverso.trainable_variables) 
      print(f"gradients {gradients} | len(self.modelo_inverso.trainable_variables) {len(self.modelo_inverso.trainable_variables)} | loss_modelo_inverso {loss_modelo_inverso}")# Obtenemos los gradientes en base a las predicciones

which print is:

gradients [None, None, None, None, None, None] | len(self.modelo_inverso.trainable_variables) 6 | loss_modelo_inverso (9.79201461332436e-12-1.57796630234136e-10j)

Which makes sense because we are not using the prediction tensor of model_1 to obtain the loss function, or in other words, we are not deriving with respect to a loss function.

I also leave a link to the drive directory in case someone wants to reproduce the results.: link to project on google drive

CYD
  • 33
  • 4
  • 1
    You need to specify all blocks and functions here, gradients cannot be propagated through non-tensorflow code. – Dr. Snoopy Jul 03 '22 at 15:30
  • Hi Dr.Snoopy, thank you for your reply. I know that gradients do not propagate over variables other than tensors, that is why I was asking if there is a way to update the model weights based only on the predictions and the value of the error generated. – CYD Jul 03 '22 at 16:05
  • Finally I have understood that this is directly impossible, or well, it is possible if all operations are performed with tensors, so as to obtain the gradients only from the model in order to update it. If this is not done in this way, in the end it is impossible to obtain the gradients of the model, because although the model influences the loss function, it is not known what the other operations influence on it and therefore, it is not possible to determine the gradients. – CYD Jul 03 '22 at 19:06
  • 1
    This is why I was asking for the intermediate operations, if all are implemented in TensorFlow, it would be possible to pass gradients through them. – Dr. Snoopy Jul 03 '22 at 19:22

0 Answers0