0

Scenario:

In a KafkaStreams web sessioning scenario, with unlimited (or years-long) retention, with interactive queries (this can be reviewed if necessary), with many clients, which have many users each (each user particular to each client), and where partitioning goes like this:

Partition by a function of (clientId, userId) % numberOfPartitions, setting a numberOfPartitions beforehand depending on the cluster size. This would allow sessioning to be performed on (clientId,userId) data, and should provide an even data distribution among the nodes (no hotspotting, on partition size or on write load).

However, when querying, I'd query by client(and time range). So then, I'd build an aggregated Ktable from that Sessions table, where key is the client, and Sessions are queried by (client, timeStart, timeEnd). That would make that data from a client to have to go into one node, which could pose scalability issues(too big a client), but since data is aggregated already, I guess that would be manageable.

Question:

In this scenario (variants appreciated), I'd like to be able to reprocess only for one client.

But data from one client would be scattered among (potentially all of) the partitions.

How can a partial reprocess be achieved in Kafka Streams with minor impact, and keep (old) state queryable in the meantime?

xmar
  • 1,729
  • 20
  • 48

1 Answers1

2

I think in general you already know the answer to your question, with the partitioning scheme like you have described it you will have to read all partitions if you want to reprocess a client, as the messages will be spread throughout all of them.

The only thing that I can come up with to limit the amount of overhead when reprocessing an entire client is to implement a partitioning scheme that groups several partitions for a client and then distributes users over those partitions to avoid overloading one partition with a particularly "large" client. The picture should hopefully clarify what I am probably failing to explain with words..

Grouping partitions per client

A custom partitioner to achieve this distribution could look somewhat like the following code. Please take this with a grain of salt, this is purely theoretical so far and has never been tested (or even run for that matter) - but it should illustrate the principle.

public class ClientUserPartitioner implements Partitioner {
int partitionGroupSize = 10;

@Override
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
    // For this we expect the key to be of the format "client/user"
    String[] splitValues = ((String)key).split("/");
    String client = splitValues[0];
    String user = splitValues[1];

    // Check that partitioncount is divisible by group size
    if (cluster.availablePartitionsForTopic(topic).size() % partitionGroupSize != 0) {
        throw new ConfigException("Partitioncount must be divisible by "+ partitionGroupSize +" for this partitioner but is " +
                cluster.availablePartitionsForTopic(topic).size() + " for topic " + topic);
    }

    // Calculate partition group from client and specific partition from user
    int clientPartitionOffset = Utils.murmur2(client.getBytes()) % partitionGroupSize * partitionGroupSize;
    int userPartition = Utils.murmur2(user.getBytes()) % partitionGroupSize;

    // Combine group and specific value to get final partition
    return clientPartitionOffset + userPartition;
}

@Override
public void configure(Map<String, ?> configs) {
    if (configs.containsKey("partition.group.size")) {
        this.partitionGroupSize = Integer.parseInt((String)configs.get("partition.group.size"));
    }
}

@Override
public void close() {
}

}

Doing this will of course impact your distribution, it might be worth your while to run a few simulations with different values for partitionGroupSize and a representative sample of your data to estimate how even the distribution is and how much overhead you'd save when reprocessing an entire client.

Sönke Liebau
  • 1,943
  • 14
  • 23
  • (1/4) Thanks! I see this tries to alleviate the reprocessing-by-client but still I don't see how to reprocess by one client: do you mean that, e.g. if I want to reprocess client1 in your drawing, I have to restart all KafkaStream nodes(or its data stores?) that handle partitions of that node(partitions 0 to 2)? I am thinking of having on the hundreds/thousands partitions per node for having a better distribution, that would mean that the likelihood of a KafkaStreams application node having one of those partitions on its list is very high and so most would need reprocessing... right? – xmar Feb 15 '18 at 18:05
  • (2/4) I can think that, if I need to reprocess all of it, I can either • Stop, clean and reprocess application using with the Reset Application Tool, which involves downtime hence not acceptable for interactive queries(unless maintenance windows ), • Launch a separate cluster with a new applicationId, run recalculation until it reaches current state, and then switch queries to new cluster. (I don't know if there are other alternatives, can't find them in https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Data+%28Re%29Processing+Scenarios ) – xmar Feb 15 '18 at 18:05
  • (3/4) When reprocessing partially, am I drawn to restart any application node that handles partitions with data involved with that client (which would reprocess too all of the data from other clients/users related to that data)? Is there any way to work around that? – xmar Feb 15 '18 at 18:05
  • (4/4) Also, I think I don't really understand "what happens" to the Kafka Streams stores in this process. Is there any way to e.g. tell a store to recalculate for a certain key - or reprocess its state until new but keeping the previous one alive? Is that something you have to work out for yourself? I.e. the coordination of the old and the new stores being created, the point in time when the new store is ready and at the latest offset so it's ready to switch, the actual switch to be made to serve data from the new store,... etc. – xmar Feb 15 '18 at 18:05