I have reproduced this error with Tensorflow 2.12 and 2.13 and wandb 0.15.4 and 0.15.5. In the case where the following two conditions hold,
- wandb is imported before tensorflow and
- A function decorated with @tf.function calls another function also decorated with @tf.function,
the following error occurs when running code from a Python script:
[libprotobuf FATAL google/protobuf/message_lite.cc:353] CHECK failed: target + size == res:
libc++abi: terminating with uncaught exception of type google::protobuf::FatalException: CHECK failed: target + size == res:
zsh: abort python mfe.py
Running from a Jupyter notebook the kernel simply crashes without an error message.
Removing the inner @tf.function or changing the order of imports resolves this error.
Here is a MFE:
import wandb # switch order of imports to toggle error
import tensorflow as tf
@tf.function # comment this out to toggle error
def custom_score(data):
pass
data = tf.random.uniform((100, 20, 24, 2), 0, 1)
train = tf.data.Dataset.from_tensor_slices(data).batch(50)
@tf.function
def train_step(data):
custom_score(data)
tf.config.run_functions_eagerly(False)
tf.print("Start training")
train_step(next(iter(train)))
tf.print("Runs finished without error.")
Could someone please enlighten me as to what is going on here. This error also occurs when calling an @tf.function function from a keras model's train_step() function, I assume because it is also an Autograph function. It is impractical for me to run custom_score() eagerly as it massively slows down training times.