0

I have a need to batch the events in a Faust stream, so I'm using the take() method. However, I would also like to access the header message, particularly a timestamp.

Ordinarily you would access the header using:

async for event in stream.events()

and then call the header using event.header, but since we're using the take method:

async for event in stream.take(500, 1)

we can't seem to get access to the raw message. Any idea on how to get this? We're just trying to have a means of highlighting a slow section of the pipeline by monitoring timestamps of each as a header, rather than adding it as a part of the value part of the sent message.

Is there another raw timestamp that's 'hidden' but accessible that I've missed?

EDIT

Using faust-streaming==0.8.4 so it's definitely up to date

Fonty
  • 239
  • 2
  • 11

1 Answers1

0

The Event and Message objects have headers as attributes that can be accessed in a stream. Both events and take utilize EventT objects so you should be able to access them the same way. The only difference is that take and its derivatives unpack EventT objects into dicts inside a buffer list whereas events yields an individual EventT at a time. You can individually access EventT objects if you set your take buffer size to 1.

There's a function introduced in faust-streaming==0.7.7 called stream.take_with_timestamp that's nearly identical to stream.take that can be utilized via:

async for event in stream.take_with_timestamp(1, 1, 'timestamp'):
  print(stream.current_event)
  if stream.current_event is not None:
    print(stream.current_event.headers)
    print(stream.current_event.message.headers)

which will show you the timestamp of each event. The caveat here is that if you set the buffer set to anything >1 and your stream times out, your stream.current_event object will be None.

Or you could just mimic the assignments inside take_with_timestamp and access event.message.timestamp inside stream.events():

async for event in stream.events():
  print(event.message.timestamp)
  print(event.headers)
  print(event.message.headers)
wbarnha
  • 127
  • 7
  • Thanks for the suggestion, but I appear to get the following: `AttributeError: 'Stream' object has no attribute 'take_with_timestamp'`. I assume it means I'm not using the correct version? Also, I can't access the raw message as described in the second suggestion (stream.events(), then event.message.timestamp), i end up with a key error - message isn't in my event. It's like i'm only getting the value part of the message. Again, the timestamp is not part of the data message I need to forward on, but is an add on at the time of processing purely for time stamping. – Fonty May 28 '22 at 01:12
  • As an example, here's the printed output from events when using `async for events in stream.events()`: `[]`. I also can't read the header from the event by inspecting each element in the list. – Fonty May 28 '22 at 01:15
  • I did some research and `take_with_timestamp` was introduced in the release of v0.7.7 of faust-streaming. Thanks for editing your question to indicate you're using v0.8.4. I'm not sure where you see the latest release of v0.6.4 when [GitHub says the latest is v0.8.4](https://github.com/faust-streaming/faust/releases). I'll edit my answer to address your requested header events. – wbarnha May 28 '22 at 02:41
  • Thanks. I had to upgrade my version and the method is now supported, and then forgot to restart my kernel so was still using the old version. Re version# I must be looking in the wrong place. But when I try to print the event with the new method, I still only get the list of events, nothing more. – Fonty May 28 '22 at 02:48
  • Thanks again @wbarnha. Unfortuantely that's not a workable solution if the `within` value has to be set so low. We can make do without it as it was only to be used to flag a slow pipeline process. If we really need it we can add in a timestamp at each section before passing the enriched data onto the next topic. – Fonty May 28 '22 at 06:08