8

The NODEJS example code for batching pubsub requests looks like this:

// Imports the Google Cloud client library
const PubSub = require(`@google-cloud/pubsub`);

// Creates a client
const pubsub = new PubSub();

/**
 * TODO(developer): Uncomment the following lines to run the sample.
 */
// const topicName = 'your-topic';
// const data = JSON.stringify({ foo: 'bar' });
// const maxMessages = 10;
// const maxWaitTime = 10000;

// Publishes the message as a string, e.g. "Hello, world!" or JSON.stringify(someObject)
const dataBuffer = Buffer.from(data);

pubsub
  .topic(topicName)
  .publisher({
    batching: {
      maxMessages: maxMessages,
      maxMilliseconds: maxWaitTime,
    },
  })
  .publish(dataBuffer)
  .then(results => {
    const messageId = results[0];
    console.log(`Message ${messageId} published.`);
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

For me it is not clear how to publish multiple messages simultaneously using this example. Could someone explain how to adjust this code so it can be used to publish multiple messages simultaneously?

Lin Du
  • 88,126
  • 95
  • 281
  • 483

2 Answers2

17

If you wanted to batch messages, then you'd need to keep hold of the publisher and call publish on it multiple times. For example, you could change the code to something like this:

// Imports the Google Cloud client library
const PubSub = require(`@google-cloud/pubsub`);

// Creates a client
const pubsub = new PubSub();


const topicName = 'my-topic';
const maxMessages = 10;
const maxWaitTime = 10000;
const data1 = JSON.stringify({ foo: 'bar1' });
const data2 = JSON.stringify({ foo: 'bar2' });
const data3 = JSON.stringify({ foo: 'bar3' });

const publisher = pubsub.topic(topicName).publisher({
    batching: {
      maxMessages: maxMessages,
      maxMilliseconds: maxWaitTime,
    },
  })

function handleResult(p) {
  p.then(results => {
    console.log(`Message ${results} published.`);
  })
  .catch(err => {
    console.error('ERROR:', err);
  });
}

// Publish three messages
handleResult(publisher.publish(Buffer.from(data1)));
handleResult(publisher.publish(Buffer.from(data2)));
handleResult(publisher.publish(Buffer.from(data3)));

Batching of messages is handled by the maxMessages and maxMilliseconds properties. The former indicates the maximum number of messages to include in a batch. The latter indicates the maximum number of milliseconds to wait to publish a batch. These properties trade off larger batches (which can be more efficient) with publish latency. If you are publishing many messages rapidly, then the maxMilliseconds property won't have much effect; as soon as ten messages are ready to go, the client library will make a publish request to the Cloud Pub/Sub service. However, if publishing is sporadic or slow, then a batch of messages might be sent before there are ten messages.

In the example code above, we call publish on three messages. This is not enough to fill up a batch and send it. Therefore, 10,000 milliseconds after the first call to publish, the three messages will be sent as a batch to Cloud Pub/Sub.

Kamal Aboul-Hosn
  • 15,111
  • 1
  • 34
  • 46
  • Thanks for your explanation. Now I get it. – Erik van den Hoorn Mar 06 '18 at 10:33
  • When does the publish method return? Immediately or when the batch is sent? – pomo Apr 24 '19 at 16:51
  • publish will return immediately. The future returned by the publish method is only fulfilled once the batch is sent and acknowledged by Cloud Pub/Sub. – Kamal Aboul-Hosn Apr 24 '19 at 21:10
  • Then to wait for the result, do we need to keep track of futures? – nsandersen Aug 09 '19 at 15:18
  • You will need to keep track of every future returned by the publish call, which is not going to be limited by maxMessages. The maxMessages indicates when the batch will get sent to the server. There can be multiple batches outstanding to the server awaiting a response. – Kamal Aboul-Hosn Aug 09 '19 at 16:51
  • I am curious how does pubsub know how many messages I published in a tick? I think there will be a window time to calculate the count of the messages, maybe in a tick or something like `setTimeout(() => getCount(messagesToBePublished),100)`, `if(getCount(messagesToBePublished) > maxMessages) callRemotePubsubService()` – Lin Du May 05 '20 at 09:42
  • It looks like you [asked this as a separate question](https://stackoverflow.com/questions/61610441/how-does-pubsub-know-how-many-messages-i-published-at-a-point-in-time) and so I have answered there. – Kamal Aboul-Hosn May 05 '20 at 10:35
4

batching explanation:

  1. If the messages to be published reach the number specified by maxMessages, then ignore maxMilliseconds option and immediately publish messages equal to the number of maxMessages in batches;

  2. If the messages to be published do not reach the number specified by maxMessages, after waiting for the maxMilliseconds time, send these messages in batch

For example for 1:

async function publishMessage(topicName) {
  console.log(`[${new Date().toISOString()}] publishing messages`);
  const pubsub = new PubSub({ projectId: PUBSUB_PROJECT_ID });
  const topic = pubsub.topic(topicName, {
    batching: {
      maxMessages: 10,
      maxMilliseconds: 10 * 1000,
    },
  });

  const n = 12;
  const dataBufs: Buffer[] = [];
  for (let i = 0; i < n; i++) {
    const data = `message payload ${i}`;
    const dataBuffer = Buffer.from(data);
    dataBufs.push(dataBuffer);
  }

  const results = await Promise.all(
    dataBufs.map((dataBuf, idx) =>
      topic.publish(dataBuf).then((messageId) => {
        console.log(`[${new Date().toISOString()}] Message ${messageId} published. index: ${idx}`);
        return messageId;
      })
    )
  );
  console.log('results:', results.toString());
}

Now, we will publish 12 messages. The execution result:

[2020-05-05T09:09:41.847Z] publishing messages
[2020-05-05T09:09:41.955Z] Message 36832 published. index: 0
[2020-05-05T09:09:41.955Z] Message 36833 published. index: 1
[2020-05-05T09:09:41.955Z] Message 36834 published. index: 2
[2020-05-05T09:09:41.955Z] Message 36835 published. index: 3
[2020-05-05T09:09:41.955Z] Message 36836 published. index: 4
[2020-05-05T09:09:41.955Z] Message 36837 published. index: 5
[2020-05-05T09:09:41.955Z] Message 36838 published. index: 6
[2020-05-05T09:09:41.955Z] Message 36839 published. index: 7
[2020-05-05T09:09:41.955Z] Message 36840 published. index: 8
[2020-05-05T09:09:41.955Z] Message 36841 published. index: 9
[2020-05-05T09:09:51.939Z] Message 36842 published. index: 10
[2020-05-05T09:09:51.939Z] Message 36843 published. index: 11
results: 36832,36833,36834,36835,36836,36837,36838,36839,36840,36841,36842,36843

Please note the timestamp. The first 10 messages will be published immediately because they the number specified by maxMessages. Then, because the rest 2 messages don't reach the number specified by maxMessages. So pubsub will wait for 10 seconds(maxMilliseconds), then send the rest 2 messages.

For example for 2:

async function publishMessage(topicName) {
  console.log(`[${new Date().toISOString()}] publishing messages`);
  const pubsub = new PubSub({ projectId: PUBSUB_PROJECT_ID });
  const topic = pubsub.topic(topicName, {
    batching: {
      maxMessages: 10,
      maxMilliseconds: 10 * 1000,
    },
  });

  const n = 5;
  const dataBufs: Buffer[] = [];
  for (let i = 0; i < n; i++) {
    const data = `message payload ${i}`;
    const dataBuffer = Buffer.from(data);
    dataBufs.push(dataBuffer);
  }

  const results = await Promise.all(
    dataBufs.map((dataBuf, idx) =>
      topic.publish(dataBuf).then((messageId) => {
        console.log(`[${new Date().toISOString()}] Message ${messageId} published. index: ${idx}`);
        return messageId;
      })
    )
  );
  console.log('results:', results.toString());
}

Now, we will send 5 messages, they don't reach the number specified by maxMessages. So pubsub will wait for 10 seconds(maxMilliseconds). After waiting for 10 seconds(maxMilliseconds), pubsub will send these 5 messages bulk. This scenario is the same as the remaining 2 messages in the first example. The exeuction result:

[2020-05-05T09:10:16.857Z] publishing messages
[2020-05-05T09:10:26.977Z] Message 36844 published. index: 0
[2020-05-05T09:10:26.977Z] Message 36845 published. index: 1
[2020-05-05T09:10:26.977Z] Message 36846 published. index: 2
[2020-05-05T09:10:26.977Z] Message 36847 published. index: 3
[2020-05-05T09:10:26.977Z] Message 36848 published. index: 4
results: 36844,36845,36846,36847,36848
Lin Du
  • 88,126
  • 95
  • 281
  • 483