1

I know that in Pulsar functions we have some limited access to the Bookkeeper's state store, with methods like context.getState(key), putState(key), deleteState(key), etc.

My question though is: Do we have a means of querying and cleaning state without having to use any kind of key?

Reason:

We'd for example would like to implement a delta function. E.g. on day 1 we send events/objects to the delta function, and they're stored in state via unique keys.

On day 2 we send again the same type of objects, where most of them are the same, some might have changed an attribute, some might be new and some might be missing compared to day 1.

For each received object we'd like to query the state to see if there have been changes to that object (and in that case send an event). If there are no changes we don't send anything.

Now as you see, the problem is that the state storage would gradually increase over time. And I don't know any way to clean objects from the state which aren't delivered any more, because that object is gone, so is its key which I would need for deleteState().

--> Question again: Are there other ways to access / query / clean the function's state, which don't rely on having any key? Maybe use the Bookkeeper's API directly?

What would help would be a way to loop over the whole state, or getEverything() and then work with that.

Toni Kanoni
  • 2,265
  • 4
  • 23
  • 29

1 Answers1

0

Let me check on that. An iterator and a clean seem really useful.

Tim Spann
  • 517
  • 2
  • 6
  • Hello Tim, any news on this? – Toni Kanoni Feb 17 '22 at 15:08
  • Due to implementation I was told that this probably wont happen. – Tim Spann Feb 17 '22 at 16:20
  • Alright, thank you for checking! Since it's really important for my organisation to understand how and if we can utilize state, could you maybe have a look at my connected follow-up question? https://stackoverflow.com/questions/71160740/apache-pulsar-functions-state-is-it-persisted-forever-if-we-dont-explicitly – Toni Kanoni Feb 17 '22 at 16:50