1

I need to compare the previous session to averages from different sessions for the same user. I'm using MapState to keep the previous session, but somehow the mapstate never contains any previous keys, so every session is new. here's my code:

SessionIdentificationProcessFunction (this is a function that gather all the events that belongs to the same session.

static SingleOutputStreamOperator<SessionEvent> sessionUser(KeyedStream<Event, String> stream) {
    return stream.window(EventTimeSessionWindows.withGap(Time.minutes(PropertyFileReader.getGAP_SECTION())))
            .allowedLateness(Time.minutes(PropertyFileReader.getLATENCY_ALLOWED()))
            .process(new SessionIdentificationProcessFunction<Event, SessionEvent, String, TimeWindow>() {
                @Override
                public void open(Configuration parameters) {
                    /*state configured to live just one day to avoid garbage accumulation*/
                    StateTtlConfig ttlConfig = StateTtlConfig
                            .newBuilder(org.apache.flink.api.common.time.Time.days(1))
                            .cleanupFullSnapshot()
                            .build();
                    MapStateDescriptor<String, SessionEvent> map_descriptor = new MapStateDescriptor<>("prevMapUserSession", String.class, SessionEvent.class);
                    map_descriptor.enableTimeToLive(ttlConfig);
                    previous_user_sessions_state = getRuntimeContext().getMapState(map_descriptor);
                }

                @Override
                public SessionEvent generateSessionRecord(String s, Context context, Iterable<Event> elements) {
                    Comparator<Event> sortFunc = (o1, o2) -> ((o1.timestamp.before(o2.timestamp)) ? 0 : 1);
                    Event start = StreamSupport.stream(elements.spliterator(), false).max(sortFunc).orElse(new Event());
                    Event end = StreamSupport.stream(elements.spliterator(), false).max(sortFunc).orElse(new Event());
                    SessionEvent session_user = (end.timestamp.equals(Timestamp.from(Instant.EPOCH))) ? new SessionEvent(start) : new SessionEvent(end);
                    session_user.sessionEvents = StreamSupport.stream(elements.spliterator(), false).count();
                    session_user.sessionDuration = sd;
                    try {
                        if (previous_user_sessions_state.contains(s)) {
                            SessionEvent previous = previous_user_sessions_state.get(s);

                           /*Update values of the session with the values of the previous which never exist and delete the previous session in the map to create a new entry with the new values updated*/

                            previous_user_sessions_state.remove(s);
                        } else {
                            /*always get here and create a new session*/
                        }

                        previous_user_sessions_state.put(s, session_user);
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                    return session_user;
                }
            })
            .name("User Sessions");
}
Display name
  • 1,228
  • 1
  • 18
  • 29
Alter
  • 903
  • 1
  • 11
  • 27

1 Answers1

0

Without seeing how SessionIdentificationProcessFunction is implemented, I'm not sure exactly what's going wrong, but Flink's session windows are rather special, so it's not terribly surprising that this isn't working. Part of the problem is that any given session window has a very short lifetime before it is merged with another session window. (As each new event arrives it is initially assigned to its own session window, after which the set of all current session windows is processed and any possible merges are performed (based on the session gap).)

What I can recommend is rather than using getRuntimeContext().getMapState(), use context.globalState().getMapState() instead (where context is the ProcessWindowFunction.Context passed to the process() method of a ProcessWindowFunction). This globalState is a KeyedStateStore meant for precisely this purpose -- keeping keyed state that is global/shared among all window instances for that key.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • Thank you so much for your answer, I added another question, see above. What am I missing? Thanks a lot again. – Alter Apr 16 '20 at 14:16
  • Please don't use an answer to expand on your question. You should either edit/extend your original question, or create an entirely new question. – David Anderson Apr 16 '20 at 14:21
  • You can create the state descriptor for the MapState you want to store in the globalState once, in the open, and then use it to access the state during each call to process. – David Anderson Apr 16 '20 at 14:26