1

I have a faust application that has two topics. The first one receives raw data in the format:

# Input of first topic
{
    timestamp: 2021-08-24 05:35:24,
    value: 40,
    device_id: ABC
}

An async function interpolate consumes this topic and calculates the value at every full 15 minutes. To accomplish this, I use a table to store the last appearance of the device_id ABC. So lets say the last message seen was

{
    timestamp: 2021-08-24 04:55:44,
    value: 30,
    device_id: ABC
}

then I will do the following:

table = app.Table(f'last_msg_store',
                    default=lambda: None)

def interpolate_values(last_msg, cur_msg):
    ...

@app.agent(source_topic)
async def interpolate(msgs):
    async for msg in msgs:
        device_id = get_key(msg)

        last_msg = table[device_id]

        if last_msg is None:
            # table is empty
            table[device_id] = msg
            continue

        timestamps, values = interpolate_values(last_msg, msg)
        
        print(timestamps)  # prints ['2021-08-24 05:00:00', '2021-08-24 05:15:00', '2021-08-24 05:30:00']
        print(values)  # prints [31.09, 34.94, 38.79]

In another topic, the target_topic, I want to calculate the delta of the values chronologically. For that I created another helper table, which again stores the last appearance, grouped after device_id. My question is now: How can I ensure that the order of events/messages has not changed, so that my delta calculation is correct?

My current approach looks like the following:

@app.agent(source_topic)
async def interpolate(msgs):
    # ... same as before ...
    for timestamp, value in zip(timestamps, values):
        new_msg = generate_new_message(timestamp, value, device_id)
        await target_topic.send(value=new_msg)

@app.agent(target_topic)
async def delta(msgs):
    async for msg in msgs.group_by(get_key, name='delta_key'):
        device_id = get_key(msg)

        last_msg = delta_table[device_id]

        if last_msg is None:
            # table is empty
            delta_table[device_id] = msg
            continue

        delta_value = msg.value - last_msg.value
        delta_timestamps = msg.timestamp - last_msg.timestamp

        # ... further processing of the data

Naturally I would expect that the order of topic insertion should be the same as messages appearing in the next topic, but this is not true in every case.

Here is how I see the messages appearing in the logs:

# Sending order is correct
[2021-09-02 12:48:55,684] Forwarding to delta_topic: <Msg: timestamp='2021-08-24 01:15:00Z', value=...>
[2021-09-02 12:48:55,685] Forwarding to delta_topic: <Msg: timestamp='2021-08-24 01:30:00Z', value=...>
[2021-09-02 12:48:55,690] Forwarding to delta_topic: <Msg: timestamp='2021-08-24 01:45:00Z', value=...>
[2021-09-02 12:48:55,691] Forwarding to delta_topic: <Msg: timestamp='2021-08-24 02:00:00Z', value=...>
[2021-09-02 12:48:55,693] Forwarding to delta_topic: <Msg: timestamp='2021-08-24 02:15:00Z', value=...>
[2021-09-02 12:48:55,694] Forwarding to delta_topic: <Msg: timestamp='2021-08-24 02:30:00Z', value=...>
[2021-09-02 12:48:55,696] Forwarding to delta_topic: <Msg: timestamp='2021-08-24 02:45:00Z', value=...>
[2021-09-02 12:48:55,698] Forwarding to delta_topic: <Msg: timestamp='2021-08-24 03:00:00Z', value=...>
[2021-09-02 12:48:55,700] Forwarding to delta_topic: <Msg: timestamp='2021-08-24 03:15:00Z', value=...>
[2021-09-02 12:48:55,704] Forwarding to delta_topic: <Msg: timestamp='2021-08-24 03:30:00Z', value=...>

# Appearance order is messed up. There are sent to the same partition because of ".group_by(...)".
[2021-09-02 12:48:56,007] Partition: 4. Msg received: <Msg: timestamp='2021-08-24T01:15:00Z', value=...>
[2021-09-02 12:48:56,017] Partition: 4. Msg received: <Msg: timestamp='2021-08-24T03:00:00Z', value=...>
[2021-09-02 12:48:56,020] Partition: 4. Msg received: <Msg: timestamp='2021-08-24T02:30:00Z', value=...>
[2021-09-02 12:48:56,021] Negative/No DURATION detected -1800.0!
[2021-09-02 12:48:56,080] Partition: 4. Msg received: <Msg: timestamp='2021-08-24T03:30:00Z', value=...>
[2021-09-02 12:48:56,092] Partition: 4. Msg received: <Msg: timestamp='2021-08-24T01:45:00Z', value=...>
[2021-09-02 12:48:56,093] Negative/No DURATION detected -6300.0!
[2021-09-02 12:48:56,096] Partition: 4. Msg received: <Msg: timestamp='2021-08-24T02:45:00Z', value=...>
[2021-09-02 12:48:56,098] Partition: 4. Msg received: <Msg: timestamp='2021-08-24T01:30:00Z', value=...>
[2021-09-02 12:48:56,099] Negative/No DURATION detected -4500.0!
[2021-09-02 12:48:56,100] Partition: 4. Msg received: <Msg: timestamp='2021-08-24T02:00:00Z', value=...>
[2021-09-02 12:48:56,104] Partition: 4. Msg received: <Msg: timestamp='2021-08-24T02:15:00Z', value=...>
[2021-09-02 12:48:56,107] Partition: 4. Msg received: <Msg: timestamp='2021-08-24T03:15:00Z', value=...>
tschmelz
  • 480
  • 1
  • 4
  • 10

0 Answers0