I'm working on a NestJS project (hybrid application) and we use KafkaJS to exchange some data between our microservices and in some cases, the order in which those data are sent to other services is very important as it cannot process the second message without the first one.
The thing is, our production order object has a property which is an array of objects (orderedArray
) that are correctly being ordered by one of it's properties (count
), it looks like this:
{
"productionOrderId": '...',
"property1": '...',
"property2": '...',
"orderedArray": [
{
"id": '...',
"timestamp": '...',
"count": 1
},
{
"id": '...',
"timestamp": '...',
"count": 2
},
{
"id": '...',
"timestamp": '...',
"count": 3
}
]
}
In one specific feature, we receive a production order creation request (REST) from our frontend and after some validation, we save it on our database.
The last step after that is to send the production order to another service, which we do so by using Kafka, the real issue is: We are saving it correctly to our database, and by debugging our application, I found out that, until we call KafkaJS method to send messages, it still in the same numeric order we saved, so I'm 100% sure we are sending it in the correct order. When we receive that order in other services, it's in a different order, to be more specific, the last two items of that array have exchanged their ordenation, like this (take a look at out the count
field):
"orderedArray": [
{
"id": '...',
"timestamp": '...',
"count": 1
},
{
"id": '...',
"timestamp": '...',
"count": 3
},
{
"id": '...',
"timestamp": '...',
"count": 2
}
]
That happens with any amount of records, but only with the last two items of the array, so:
- 1 2 3 becomes 1 3 2
- 1 2 becomes 2 1
- 1 2 3 4 5 6 7 becomes 1 2 3 4 5 7 6
- 1 2 3 ... 99 100 becomes 1 2 3 ... 100 99
This is how we build our message (don't mind the typying, I'm about to refactor this once I figure this problem out):
// KafkaService
this.kafkaClient.emit(message.topic, message.messages),
// OrderService
for (const order of productionOrders) {
this.sendMessage(
kafkaUtilities.buildOrderedMessage(
'my-topic-here', Array.of(order),
'my-key-here')
);
}
// KafkaUtilities
private createMessage(topic: string, data: Array<unknown>, key?: string): ProducerRecord {
const content = {
value: JSON.stringify(data),
} as any
if (key) {
this.timestamp += 1000
content.key = key
content.timestamp = this.timestamp.toString()
}
return { topic, messages: content }
}
I'm adding one second to the timestamp for each record we need to send. Now, the only thing that makes ordenation work is to add a "sleep" function inside my for loop, but it just needs to be there, it works even with 0.5ms:
function sleep(ms: number) {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
Our Kafka configs:
options: {
client: {
brokers: ['my-broker'],
connectionTimeout: 4000,
logLevel: logLevel[env.KAFKA_ENABLE_LOG ? 'DEBUG' : 'NOTHING'],
sasl: getSasl(enableSecurity),
ssl: enableSecurity,
requestTimeout: 90000,
},
consumer: {
groupId: 'my-id',
heartbeatInterval: 3000,
metadataMaxAge: 180000,
sessionTimeout: 60000,
retry: {
initialRetryTime: 30000,
retries: 578,
multiplier: 2,
maxRetryTime: 300000,
factor: 0,
},
},
producer: {
metadataMaxAge: 180000,
},
},
We've spent some time trying to figure out what was causing issue, but we're not certain on what could be causing it yet. Here's what I've tried so far (some stuff might not make much sense without context):
- Setting
maxInFlightRequests
to 1 - Setting
key
andpartition
to messages (tried the same for every message and a sequential one for each message) - Creating a new repository and a new topic (still had the same issue)
- Trying using the same Kafka instance we use on our cloud environment
What I'm about to try (and will update this question asap):
- Downgrading KafkaJS
- Testing with a pure NodeJS project
Our project information:
NestJS version
: 8.2.6
KafkaJS version
: 1.15.0
On our cloud environment we use Event Hubs events as a Kafka provider, but it also happens locally with Kafka.
My guess this is a KafkaJS related issue, but I'm currently only guessing.