2

I have a Flink job, and one of the statefull operators needs to keep into the state a class that contains a HashMap as attribute, because this hasMap keeps different affinities for a user, example:

public class Affinity {
public String id;
public String colorTriggered;
public Map<String,Integer> affinities;
/*this object keeps the affinity for a user to a different colors for example: 
affinities.put(green, 5);
affinities.put(blue, 9);
affinities.put(white, 2);

to calculate then what is the color's affinity of this user, in this case the answer will be blue
*/
}

This hashMap is used to track those affinities and in certain moment ask for the color's affinity of a user and get the key of the highest affinity value which will be blue that the value 9.

As hashMaps are not part of Flink serialization I will need to include implement Serializable to my class.

Is that a bad idea or there is a better way to do this and keep the object into the states?

In a full example more or less what I need to do but not sure if using HashMap into a Flink operator and into states is a good idea:

public class AffinityFlatMapFunction extends RichFlatMapFunction<Event, Affinity> implements MapOperations {

  @Override
  public void flatMap(Event event, Collector<Affinity> collector) throws Exception {
   Affinity previous = state.value();
    if(previous.hashMap.contains(event.color)){
        previous.hashMap.replace(event.color, value + 1);
    }else previous.hashMap.put(event.color, 1);
   /*something like this*/
  String match = previous.hashMap.stream.filter(x -> 
              x.getKey().contains(event.color)).max(Map.Entry.comparingByValue())
                .map(Map.Entry::getKey).orElse("empty");
   if(!match.equals(previous.colorTriggered){
       previous.colorTriggered = match;
       state.update(previous);
       collector.collect(previous);
   }
 }
}

Kind regards!

Alter
  • 903
  • 1
  • 11
  • 27
  • I believe a RocksDB store is preferred over a HashMap – OneCricketeer Sep 04 '20 at 19:00
  • Otherwise, if you emit tuples of `(name, 1)` into a stream and wordcount it, then you build the same mapping – OneCricketeer Sep 04 '20 at 19:01
  • Hi @OneCricketeer, thanks for the answer, I already edited the question, have a look now please. Thanks – Alter Sep 04 '20 at 19:15
  • I cannot use RocksDB because of the latency increment is not allow in this application. – Alter Sep 04 '20 at 19:19
  • In solution of this situation I'd create a `MapState state` in where each key is based on `String key = event.id + event.color`, and then ask for `state.contains(key)` and do the operations showed above with this bases, but I'm not sure if this is a good idea for checkpoints and CPU uses. Thanks. – Alter Sep 04 '20 at 19:39

1 Answers1

0

According to the documentation, there is a state construct called MapState<UK, UV>, which does following:

MapState<UK, UV>: This keeps a list of mappings. You can put key-value pairs into the state and retrieve an Iterable over all currently stored mappings. Mappings are added using put(UK, UV) or putAll(Map<UK, UV>). The value associated with a user key can be retrieved using get(UK). The iterable views for mappings, keys and values can be retrieved using entries(), keys() and values() respectively. You can also use isEmpty() to check whether this map contains any key-value mappings.

I once read in a Flink thread a few days back that the provided StateDescriptors are optimized and almost always the preferred choice instead of implementing an own mechanism.

If you don't shard your stream in an unwanted way (using keyBy(color) should be fine), you should always have the most current state of your Map. I don't know if your concern about RocksDB latency is valid, since Flink state is kept on heap and only checkpointed to a RocksDB, so all current values are available on the fly; but I may have misunderstood that. In retrospect I even doubt you need a map, but a simple ValueState to hold your integer, as the "key" part of the map is taken care of by keyBy() in that case.

kopaka
  • 535
  • 4
  • 17
  • Hi @kopaka, thanks a lot for your answer, but I'm afraid that isn't work for me, because I have already a `KeyedStream` by user id criteria and I need to track the affinity for that same user in about 5 different attributes, is not just colors, that was and example, saying this I cannot create a new `KeyedStream` for each one of this attributes, that is the reason of `hasMap` or `MapState`, but not a new `keyby`. Currently I'n using `MapState in solution but trying to find a better solution of this. Kind regards. – Alter Sep 07 '20 at 17:00
  • A question about this `MapStates`: if I have `MapState state` and `HashMap affinity` and do something like this `affinity.put(red, 5); affinity.put(blue, 10); affinity.put(black, 25);` an then I do `state.putAll(affinty);`. When I do `state.get(key);` will I receibe only the `Integer` value or the full `HashMap` ? as far as I understand that will be the same of doing `state.put(black,25); ...` and then `state.get(black)` will returns 25. Thanks – Alter Sep 07 '20 at 17:18