0

I have reproduced this error with Tensorflow 2.12 and 2.13 and wandb 0.15.4 and 0.15.5. In the case where the following two conditions hold,

  1. wandb is imported before tensorflow and
  2. A function decorated with @tf.function calls another function also decorated with @tf.function,
    the following error occurs when running code from a Python script:
[libprotobuf FATAL google/protobuf/message_lite.cc:353] CHECK failed: target + size == res: 
libc++abi: terminating with uncaught exception of type google::protobuf::FatalException: CHECK failed: target + size == res: 
zsh: abort      python mfe.py

Running from a Jupyter notebook the kernel simply crashes without an error message.

Removing the inner @tf.function or changing the order of imports resolves this error.

Here is a MFE:

import wandb # switch order of imports to toggle error
import tensorflow as tf

@tf.function # comment this out to toggle error
def custom_score(data):
    pass

data = tf.random.uniform((100, 20, 24, 2), 0, 1)
train = tf.data.Dataset.from_tensor_slices(data).batch(50)

@tf.function
def train_step(data):
    custom_score(data)
        
tf.config.run_functions_eagerly(False)
tf.print("Start training")
train_step(next(iter(train)))
tf.print("Runs finished without error.")

Could someone please enlighten me as to what is going on here. This error also occurs when calling an @tf.function function from a keras model's train_step() function, I assume because it is also an Autograph function. It is impractical for me to run custom_score() eagerly as it massively slows down training times.

0 Answers0