-2

I am trying to ingest an avro file from a gcs to pubsub. Have some layman questions.

  1. What are the options to send a file as a message in PubSub? Like can we send the whole file as a message or only the contents of the file can be iterated and send across? If we are sending the whole file, how can the file be reconstructed in the consumer side? It would be helpful if any example code can be provided.

  2. When do we have to serialize and deserialize the messages? What is the purpose of this serialization?

I did do my research but have this questions coming up. If someone could help understand this better. The examples i see are sending the file contents in iterations and not the whole file blob.

mehere
  • 1,487
  • 5
  • 28
  • 50

1 Answers1

0
  1. You can send the whole file as a message by converting it into a Byte Array or String (Most languages have encoding ADTs). This might not be efficient for big files, in that case you can split the file into chunks (e.g., 512 KB sized) and then send them as a Byte Stream message.

Direct example from GCP Publisher example.

# The `topic_path` method creates a fully qualified identifier
# in the form `projects/{project_id}/topics/{topic_id}` topic_path = publisher.topic_path(project_id, topic_id)

for n in range(1, 10):
    data_str = f"Message number {n}"
    # Data must be a bytestring
    data = data_str.encode("utf-8")
    # When you publish a message, the client returns a future.
    future = publisher.publish(topic_path, data)
    print(future.result())

print(f"Published messages to {topic_path}.") ```

how can the file be reconstructed in the consumer side?

Decode like you encode that into the producer. Not really a pain point here.

  1. The main point of serialization is efficient transmission and communication in a network. Data is serialized just before initialization of a transmission and deserialized after received.
Jishan Shaikh
  • 1,572
  • 2
  • 13
  • 31
  • In the example, n is the message right? I have gone through this example. – mehere Dec 09 '22 at 21:58
  • How do we convert the whole file to byte array? by whole file do you mean the whole file contents? – mehere Dec 09 '22 at 21:58
  • `n` is not the message, the message is the text string, “Message number 1”, for example. We sent 10 such messages to publisher. Yes, whole file to byte array meant the whole file contents. That's why we talked about the splitting the file if the size is big. – Jishan Shaikh Dec 10 '22 at 05:09
  • 1
    Okay. I was thinking the whole file meant the physical whole file object is being converted. So whole file means the whole file content is split. Got it. Thank you. – mehere Dec 11 '22 at 14:48