I built a stream that does a windowed join, when deploying on production all is fine in terms of memory and performance.
However, I needed Deduplication, hence implemented a Transformer that does that with the help of a WindowStore.
After deploying it, we are getting the data results that was expected, but memory keeps growing until the pod crashes with OOM.
After doing research I implemented many tricks to reduce memory usage, but they didn't help, below is the code.
It's clear to me that using the WindowStore is causing this issue, but how to limit it?
The Store:
var storeBuilder = Stores.windowStoreBuilder(
Stores.persistentWindowStore(
storeName,
Duration.ofSeconds(6),
Duration.ofSeconds(5),
false
),
Serdes.String(),
SerdeFactory.JsonSerde(valueDataClass)
);
The stream:
var leftStream = builder.stream("leftTopic").filter(...);
var rightStream = builder.stream("rightTopic").filter(...);
leftStream.join(rightStream, joiner, JoinWindows
.of(Duration.ofSeconds(5))
.grace(Duration.ofSeconds(1))
.until(
Duration.ofSeconds(6)
)
.transformValues(
() ->
new DeduplicationTransformer<>(
storeName,
Duration.ofSeconds(6),
(key, value) -> value.id
),
storeName
)
.filter((k, v) -> v != null)
.to("targetTopic");
Deduplication Transformer:
public class DeduplicationTransformer<K, V extends StreamModel>
implements ValueTransformerWithKey<K, V, V> {
private ProcessorContext context;
private String storeName;
private WindowStore<K, V> eventIdStore;
private final long leftDurationMs;
private final KeyValueMapper<K, V, K> idExtractor;
public DeduplicationTransformer(
String storeName,
long maintainDurationPerEventInMs,
final KeyValueMapper<K, V, K> idExtractor
) {
if (maintainDurationPerEventInMs < 2) {
throw new IllegalArgumentException(
"maintain duration per event must be > 1"
);
}
leftDurationMs = maintainDurationPerEventInMs;
this.idExtractor = idExtractor;
this.storeName = storeName;
}
@Override
public void init(final ProcessorContext context) {
this.context = context;
eventIdStore = (WindowStore<K, V>) context.getStateStore(storeName);
Duration interval = Duration.ofMillis(leftDurationMs);
this.context.schedule(
interval,
PunctuationType.WALL_CLOCK_TIME,
timestamp -> {
Instant from = Instant.ofEpochMilli(
System.currentTimeMillis() - leftDurationMs * 2
);
Instant to = Instant.ofEpochMilli(
System.currentTimeMillis() - leftDurationMs
);
KeyValueIterator<Windowed<K>, V> iterator = eventIdStore.fetchAll(
from,
to
);
while (iterator.hasNext()) {
KeyValue<Windowed<K>, V> entry = iterator.next();
eventIdStore.put(entry.key.key(), null, entry.key.window().start());
}
iterator.close();
context.commit();
}
);
}
@Override
public V transform(final K key, final V value) {
try {
final K eventId = idExtractor.apply(key, value);
if (eventId == null) {
return value;
} else {
final V output;
if (isDuplicate(eventId)) {
output = null;
} else {
output = value;
rememberNewEvent(eventId, value, context.timestamp());
}
return output;
}
} catch (Exception e) {
return null;
}
}
private boolean isDuplicate(final K eventId) {
final long eventTime = context.timestamp();
final WindowStoreIterator<V> timeIterator = eventIdStore.fetch(
eventId,
eventTime - leftDurationMs,
eventTime
);
final boolean isDuplicate = timeIterator.hasNext();
timeIterator.close();
return isDuplicate;
}
private void rememberNewEvent(final K eventId, V v, final long timestamp) {
eventIdStore.put(eventId, v, timestamp);
}
@Override
public void close() {}
}
RocksDB config:
public class BoundedMemoryRocksDBConfig implements RocksDBConfigSetter {
private Cache cache = new LRUCache(5 * 1024 * 1204L);
private Filter filter = new BloomFilter();
private WriteBufferManager writeBufferManager = new WriteBufferManager(
4 * 1024 * 1204L,
cache
);
@Override
public void setConfig(
final String storeName,
final Options options,
final Map<String, Object> configs
) {
BlockBasedTableConfig tableConfig = (BlockBasedTableConfig) options.tableFormatConfig();
tableConfig.setBlockCache(cache);
tableConfig.setCacheIndexAndFilterBlocks(true);
options.setWriteBufferManager(writeBufferManager);
tableConfig.setCacheIndexAndFilterBlocksWithHighPriority(false);
tableConfig.setPinTopLevelIndexAndFilter(true);
tableConfig.setBlockSize(4 * 1024L);
options.setMaxWriteBufferNumber(1);
options.setWriteBufferSize(1024 * 1024L);
options.setTableFormatConfig(tableConfig);
}
@Override
public void close(final String storeName, final Options options) {
cache.close();
filter.close();
}
}
Config:
props.put(
StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG,
0
);
props.put(
StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG,
BoundedMemoryRocksDBConfig.class
);
Things I've tried so far:
- Using a bounded RocksDB config setter
- Using jemalloc instead of malloc
- Reducing the retention period to 5 seconds
- Reducing the number of partitions of the topics (this only slowed the rate of the memory leak)
- used in-memory stores instead of persistent, and memory was very stable, but the app startup takes around 10 minutes on each deployment.