1

I built a stream that does a windowed join, when deploying on production all is fine in terms of memory and performance.

However, I needed Deduplication, hence implemented a Transformer that does that with the help of a WindowStore.

After deploying it, we are getting the data results that was expected, but memory keeps growing until the pod crashes with OOM.

After doing research I implemented many tricks to reduce memory usage, but they didn't help, below is the code.

It's clear to me that using the WindowStore is causing this issue, but how to limit it?

The Store:

var storeBuilder = Stores.windowStoreBuilder(
      Stores.persistentWindowStore(
        storeName,
        Duration.ofSeconds(6),
        Duration.ofSeconds(5),
        false
      ),
      Serdes.String(),
      SerdeFactory.JsonSerde(valueDataClass)
    );

The stream:

var leftStream  = builder.stream("leftTopic").filter(...);
var rightStream = builder.stream("rightTopic").filter(...);
leftStream.join(rightStream, joiner, JoinWindows
                  .of(Duration.ofSeconds(5))
                  .grace(Duration.ofSeconds(1))
                  .until(
                    Duration.ofSeconds(6)
                  )
              .transformValues(
                 () ->
                  new DeduplicationTransformer<>(
                      storeName,
                      Duration.ofSeconds(6),
                      (key, value) -> value.id
                  ),
                  storeName
              )
              .filter((k, v) -> v != null)
              .to("targetTopic");

Deduplication Transformer:

public class DeduplicationTransformer<K, V extends StreamModel>
  implements ValueTransformerWithKey<K, V, V> {

  private ProcessorContext context;
  private String storeName;

  private WindowStore<K, V> eventIdStore;

  private final long leftDurationMs;

  private final KeyValueMapper<K, V, K> idExtractor;

  public DeduplicationTransformer(
    String storeName,
    long maintainDurationPerEventInMs,
    final KeyValueMapper<K, V, K> idExtractor
  ) {
    if (maintainDurationPerEventInMs < 2) {
      throw new IllegalArgumentException(
        "maintain duration per event must be > 1"
      );
    }
    leftDurationMs = maintainDurationPerEventInMs;
    this.idExtractor = idExtractor;
    this.storeName = storeName;
  }

  @Override
  public void init(final ProcessorContext context) {
    this.context = context;
    eventIdStore = (WindowStore<K, V>) context.getStateStore(storeName);

    Duration interval = Duration.ofMillis(leftDurationMs);
    this.context.schedule(
        interval,
        PunctuationType.WALL_CLOCK_TIME,
        timestamp -> {
          Instant from = Instant.ofEpochMilli(
            System.currentTimeMillis() - leftDurationMs * 2
          );
          Instant to = Instant.ofEpochMilli(
            System.currentTimeMillis() - leftDurationMs
          );
          KeyValueIterator<Windowed<K>, V> iterator = eventIdStore.fetchAll(
            from,
            to
          );
          while (iterator.hasNext()) {
            KeyValue<Windowed<K>, V> entry = iterator.next();
            eventIdStore.put(entry.key.key(), null, entry.key.window().start());
          }
          iterator.close();
          context.commit();
        }
      );
  }

  @Override
  public V transform(final K key, final V value) {
    try {
      final K eventId = idExtractor.apply(key, value);
      if (eventId == null) {
        return value;
      } else {
        final V output;
        if (isDuplicate(eventId)) {
          output = null;
        } else {
          output = value;
          rememberNewEvent(eventId, value, context.timestamp());
        }
        return output;
      }
    } catch (Exception e) {
      return null;
    }
  }

  private boolean isDuplicate(final K eventId) {
    final long eventTime = context.timestamp();
    final WindowStoreIterator<V> timeIterator = eventIdStore.fetch(
      eventId,
      eventTime - leftDurationMs,
      eventTime
    );
    final boolean isDuplicate = timeIterator.hasNext();
    timeIterator.close();
    return isDuplicate;
  }

  private void rememberNewEvent(final K eventId, V v, final long timestamp) {
    eventIdStore.put(eventId, v, timestamp);
  }

  @Override
  public void close() {}
}

RocksDB config:


public class BoundedMemoryRocksDBConfig implements RocksDBConfigSetter {

  private Cache cache = new LRUCache(5 * 1024 * 1204L);
  private Filter filter = new BloomFilter();

  private WriteBufferManager writeBufferManager = new WriteBufferManager(
    4 * 1024 * 1204L,
    cache
  );

  @Override
  public void setConfig(
    final String storeName,
    final Options options,
    final Map<String, Object> configs
  ) {
    BlockBasedTableConfig tableConfig = (BlockBasedTableConfig) options.tableFormatConfig();

    tableConfig.setBlockCache(cache);
    tableConfig.setCacheIndexAndFilterBlocks(true);
    options.setWriteBufferManager(writeBufferManager);

    tableConfig.setCacheIndexAndFilterBlocksWithHighPriority(false);
    tableConfig.setPinTopLevelIndexAndFilter(true);
    tableConfig.setBlockSize(4 * 1024L);

    options.setMaxWriteBufferNumber(1);
    options.setWriteBufferSize(1024 * 1024L);

    options.setTableFormatConfig(tableConfig);
  }

  @Override
  public void close(final String storeName, final Options options) {
    cache.close();
    filter.close();
  }
}

Config:

props.put(
      StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG,
      0
    );
    props.put(
      StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG,
      BoundedMemoryRocksDBConfig.class
    );

Things I've tried so far:

  • Using a bounded RocksDB config setter
  • Using jemalloc instead of malloc
  • Reducing the retention period to 5 seconds
  • Reducing the number of partitions of the topics (this only slowed the rate of the memory leak)
  • used in-memory stores instead of persistent, and memory was very stable, but the app startup takes around 10 minutes on each deployment.
Sari Alalem
  • 870
  • 7
  • 18
  • Can you please elaborate on the following: resources on your machine(s) (memory, cpu), how many app instances.. I assume you use the default NUM_STREAM_THREADS which is 1, right ? Also, if possible, can you share your app repo with us ? would be easier to debug the problem – Dor Weid Jan 28 '21 at 16:48
  • Hi Dor, sorry can't share the repo, as for the other details: --- NUM_STREAM_THREADS: yes, using deault --- Memory: it's a k8s cluster, I assign to each pod up to 2GB of memory, it takes around 5 days to get to 2GB and then crash --- CPU: each pod is not using CPU, utilization is around 40m and I have a limit of 200m for each pod --- PODS: I have 6 instances, the source topics are 200 partition each – Sari Alalem Jan 31 '21 at 08:50
  • The setting mentioned in BoundedMemoryRocksDBConfig is used per rocksdb store. If you start using static for those parameters, it should work fine ``` private static Cache cache = new LRUCache(5 * 1024 * 1204L); private static Filter filter = new BloomFilter(); private static WriteBufferManager writeBufferManager = new WriteBufferManager( 4 * 1024 * 1204L, cache ); ``` – SunilS Feb 03 '21 at 09:07
  • Look at this answer about retention if it can clarify something https://stackoverflow.com/questions/63455195/how-to-test-a-windowstore-retention-period – Marco Massetti Feb 03 '21 at 22:20

0 Answers0