We are using spring cloud stream with Kafka 2.0.1 and utilizing the InteractiveQueryService to fetch data from the stores. There are 4 stores that persist data on disk after aggregating data. The code for the topology looks like this:
@Slf4j
@EnableBinding(SensorMeasurementBinding.class)
public class Consumer {
public static final String RETENTION_MS = "retention.ms";
public static final String CLEANUP_POLICY = "cleanup.policy";
@Value("${windowstore.retention.ms}")
private String retention;
/**
* Process the data flowing in from a Kafka topic. Aggregate the data to:
* - 2 minute
* - 15 minutes
* - one hour
* - 12 hours
*
* @param stream
*/
@StreamListener(SensorMeasurementBinding.ERROR_SCORE_IN)
public void process(KStream<String, SensorMeasurement> stream) {
Map<String, String> topicConfig = new HashMap<>();
topicConfig.put(RETENTION_MS, retention);
topicConfig.put(CLEANUP_POLICY, "delete");
log.info("Changelog and local window store retention.ms: {} and cleanup.policy: {}",
topicConfig.get(RETENTION_MS),
topicConfig.get(CLEANUP_POLICY));
createWindowStore(LocalStore.TWO_MINUTES_STORE, topicConfig, stream);
createWindowStore(LocalStore.FIFTEEN_MINUTES_STORE, topicConfig, stream);
createWindowStore(LocalStore.ONE_HOUR_STORE, topicConfig, stream);
createWindowStore(LocalStore.TWELVE_HOURS_STORE, topicConfig, stream);
}
private void createWindowStore(
LocalStore localStore,
Map<String, String> topicConfig,
KStream<String, SensorMeasurement> stream) {
// Configure how the statestore should be materialized using the provide storeName
Materialized<String, ErrorScore, WindowStore<Bytes, byte[]>> materialized = Materialized
.as(localStore.getStoreName());
// Set retention of changelog topic
materialized.withLoggingEnabled(topicConfig);
// Configure how windows looks like and how long data will be retained in local stores
TimeWindows configuredTimeWindows = getConfiguredTimeWindows(
localStore.getTimeUnit(), Long.parseLong(topicConfig.get(RETENTION_MS)));
// Processing description:
// The input data are 'samples' with key <installationId>:<assetId>:<modelInstanceId>:<algorithmName>
// 1. With the map we add the Tag to the key and we extract the error score from the data
// 2. With the groupByKey we group the data on the new key
// 3. With windowedBy we split up the data in time intervals depending on the provided LocalStore enum
// 4. With reduce we determine the maximum value in the time window
// 5. Materialized will make it stored in a table
stream
.map(getInstallationAssetModelAlgorithmTagKeyMapper())
.groupByKey()
.windowedBy(configuredTimeWindows)
.reduce((aggValue, newValue) -> getMaxErrorScore(aggValue, newValue), materialized);
}
private TimeWindows getConfiguredTimeWindows(long windowSizeMs, long retentionMs) {
TimeWindows timeWindows = TimeWindows.of(windowSizeMs);
timeWindows.until(retentionMs);
return timeWindows;
}
/**
* Determine the max error score to keep by looking at the aggregated error signal and
* freshly consumed error signal
*
* @param aggValue
* @param newValue
* @return
*/
private ErrorScore getMaxErrorScore(ErrorScore aggValue, ErrorScore newValue) {
if(aggValue.getErrorSignal() > newValue.getErrorSignal()) {
return aggValue;
}
return newValue;
}
private KeyValueMapper<String, SensorMeasurement,
KeyValue<? extends String, ? extends ErrorScore>> getInstallationAssetModelAlgorithmTagKeyMapper() {
return (s, sensorMeasurement) -> new KeyValue<>(s + "::" + sensorMeasurement.getT(),
new ErrorScore(sensorMeasurement.getTs(), sensorMeasurement.getE(), sensorMeasurement.getO()));
}
}
So we are materializing aggregated data to four different stores after determining the max value within a specific window for a specific key. Please note that retention which is set to two months of data and the clean up policy delete. We don't compact data.
The size of the individual state stores on disk is between 14 to 20 gb of data.
We are making use of Interactive Queries: https://docs.confluent.io/current/streams/developer-guide/interactive-queries.html#interactive-queries
On our setup we have 4 instances of our streaming app to be used as one consumer group. So every instance will store a specific part of all data in its store.
This all seems to work nicely. Until we restart one or more instances and wait for it to become available again. I would expect that the restart of the app would not take that long but unfortunately it takes op to 1 hour. I guess that the issue is caused by the amount of data in combination of restoring state stores, but I'm not sure. I would have expected that as we persist the state store data on persisted volumes outside of the container that runs on kubernetes, the app would receive the last offset from the broker and only has to continue from that point as the previously consumed data is already there in the state store. Unfortunately I don't have a clue how to resolve this.
Restarting our app triggers a restore task:
-StreamThread-2] Restoring task 4_3's state store twelve-hours-error-score from beginning of the changelog anomaly-timeline-twelve-hours-error-score-changelog-3.
This process takes quite a while. Why is it restoring from beginning and why does it take so long? I do have auto.offset.reset set to "earliest" but that is only being used when the offset is unknown isn't it?
Here are my stream settings. Note the max.bytes.buffering set to 0. I changed this, but that didn't make a difference. I also read about a bug with the num.stream.threads where > 1 causes issues, but also putting this on 1 doesn't improve restart speed.
2019-03-05 13:44:53,360 INFO main org.apache.kafka.common.config.AbstractConfig StreamsConfig values:
application.id = anomaly-timeline
application.server = localhost:5000
bootstrap.servers = [localhost:9095]
buffered.records.per.partition = 1000
cache.max.bytes.buffering = 0
client.id =
commit.interval.ms = 500
connections.max.idle.ms = 540000
default.deserialization.exception.handler = class org.apache.kafka.streams.errors.LogAndFailExceptionHandler
default.key.serde = class org.apache.kafka.common.serialization.Serdes$StringSerde
default.production.exception.handler = class org.apache.kafka.streams.errors.DefaultProductionExceptionHandler
default.timestamp.extractor = class errorscore.raw.boundary.ErrorScoreTimestampExtractor
default.value.serde = class errorscore.raw.boundary.ErrorScoreSerde
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
num.standby.replicas = 1
num.stream.threads = 2
partition.grouper = class org.apache.kafka.streams.processor.DefaultPartitionGrouper
poll.ms = 100
processing.guarantee = at_least_once
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
replication.factor = 1
request.timeout.ms = 40000
retries = 0
retry.backoff.ms = 100
rocksdb.config.setter = null
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
state.cleanup.delay.ms = 600000
state.dir = ./state-store
topology.optimization = none
upgrade.from = null
windowstore.changelog.additional.retention.ms = 86400000
It also log these messages after a while:
CleanupThread] Deleting obsolete state directory 1_1 for task 1_1 as 1188421ms has elapsed (cleanup delay is 600000ms).
Also something to note, I did add the following code in order to override the default cleanUp on start and stop where the stores by default are deleted:
@Bean
public CleanupConfig cleanupConfig() {
return new CleanupConfig(false, false);
}
any help would be appreciated!