1

I'm working with a google API to process documents from upload. What I'm trying to achieve is saving the protobuf in the response as a .proto file so I could work with it later.

I can do response._pb.SerializeToString(), however, I couldn't figure out how to work with this later. I tried to write this result in a .proto format file like:

with open("doc.proto", "wb") as f:
    f.write(response._pb.SerializeToString())

But the file does not seem like a proper .proto file and I couldn't run it through the protoc compiler as follows: protoc -I=. --python_out=. ./doc.proto

I get a bunch of errors like:

doc.proto:7398:6: Invalid control characters encountered in text.
doc.proto:7398:9: Interpreting non ascii codepoint 225.
doc.proto:7398:12: Invalid control characters encountered in text.
doc.proto:7398:15: Need space between number and identifier.
doc.proto:7398:16: Invalid control characters encountered in text.

To summarize, I'm just trying to serialize/deserialize the protobuf API response.

Holt Skinner
  • 1,692
  • 1
  • 8
  • 21

1 Answers1

0

I encourage you to read up on Protocol Buffers and Python Tutorial to better understand how everything works.

Protobuf (Protocol Buffers) messages are defined in a text format and often stored in .proto files. These are schema-like definitions of the messages to be exchanged.

These proto files are compiled by a tool called protoc into one or more programming language source files. These sources provide a mechanism for your code to create protocol buffer messages that agree with the schema. These messages are sent in a binary format that's efficient for transmission across networks but very difficult for humans to understand.

In your case (using Python), you'll want to "unmarshall" the response messages that you receive into objects of the Python classes that protoc generated for you. If you created the proto files, then you have the sources. If another developer created them, they should be able to provide you either with the Python sources or with the proto files (from which you can generate the Python sources).

Then, when you have messages unmarshalled as Python objects, you can store them however you'd prefer including converting them to JSON perhaps and writing them to a text file.

In summary, you're receiving binary-encoded protobuf messages. You need to decode these using protoc generated sources before you can do much useful with the data.

NOTE it's challenging to go from a binary-encoded message directly to something useful without knowing the proto (schema) or without having protoc-generated (from the proto) sources to do the work for you.

DazWilkin
  • 32,823
  • 5
  • 47
  • 88