i'm working with some tensorflow code and trying to load a trained checkpoint, but it's failing with a protobuf error like this:
[libprotobuf ERROR google/protobuf/wire_format_lite.cc:577] String field 'tensorflow.TensorShapeProto.Dim.name' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.
Traceback (most recent call last):
[...]
File "/home/sopi/miniconda3/envs/magenta2/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3053, in _as_graph_def
graph.ParseFromString(compat.as_bytes(data))
google.protobuf.message.DecodeError: Error parsing message
in order to debug the training code that apparently is producing invalid utf-8, i'd like to know what the invalid data in question actually looks like. stepping through the code in pdb doesn't get me very far since ParseFromString()
is implemented in C++.
how can i find out what the invalid utf-8 data is? or even the position in the byte array at which the error occurred?
(in this case, graph
is a tensorflow.core.framework.graph_pb2.GraphDef
, which is a subclass of google.protobuf.message.Message
. but my question concerns protobuf parsing in general and i don't think there's anything special about GraphDef
in this respect)