3

I'm trying to figure out why using the merge operator for a large number of keys with rocksdb is very slow.

My program uses a simple associative merge operator (based on upstream StringAppendOperator) that concatenates values using a delimiter for a given key. It takes a very long time to merge all the keys and for the program to finish running.

PS: I built rocksdb from source - latest master. I'm not sure if I'm missing something very obvious.

Here's a minimally reproducible example with about 5 million keys - number of keys can be adjusted by changing the limit of the for loop. Thank you in advance!

#include <filesystem>
#include <iostream>
#include <utility>

#include <rocksdb/db.h>
#include "rocksdb/merge_operator.h"

// Based on: https://github.com/facebook/rocksdb/blob/main/utilities/merge_operators/string_append/stringappend.h#L13
class StringAppendOperator : public rocksdb::AssociativeMergeOperator
{
public:
    // Constructor: specify delimiter
    explicit StringAppendOperator(std::string delim) : delim_(std::move(delim)) {};

    bool Merge(const rocksdb::Slice &key, const rocksdb::Slice *existing_value,
                       const rocksdb::Slice &value, std::string *new_value,
                       rocksdb::Logger *logger) const override;

    static const char *kClassName() { return "StringAppendOperator"; }
    static const char *kNickName() { return "stringappend"; }
    [[nodiscard]] const char *Name() const override { return kClassName(); }
    [[nodiscard]] const char *NickName() const override { return kNickName(); }

private:
    std::string delim_;// The delimiter is inserted between elements
};

// Implementation for the merge operation (concatenates two strings)
bool StringAppendOperator::Merge(const rocksdb::Slice & /*key*/,
                                 const rocksdb::Slice *existing_value,
                                 const rocksdb::Slice &value, std::string *new_value,
                                 rocksdb::Logger * /*logger*/) const
{
    // Clear the *new_value for writing.
    assert(new_value);
    new_value->clear();

    if (!existing_value)
    {
        // No existing_value. Set *new_value = value
        new_value->assign(value.data(), value.size());
    }
    else
    {
        // Generic append (existing_value != null).
        // Reserve *new_value to correct size, and apply concatenation.
        new_value->reserve(existing_value->size() + delim_.size() + value.size());
        new_value->assign(existing_value->data(), existing_value->size());
        new_value->append(delim_);
        new_value->append(value.data(), value.size());
        std::cout << "Merging " << value.data() << "\n";
    }
    return true;
}


int main()
{
    rocksdb::Options options;
    options.create_if_missing = true;
    options.merge_operator.reset(new StringAppendOperator(","));

    # tried a variety of settings
    options.max_background_compactions = 16;
    options.max_background_flushes = 16;
    options.max_background_jobs = 16;
    options.max_subcompactions = 16;

    rocksdb::DB *db{};
    auto s = rocksdb::DB::Open(options, "/tmp/test", &db);
    assert(s.ok());

    rocksdb::WriteBatch wb;
    for (uint64_t i = 0; i < 2500000; i++)
    {
        wb.Merge("a:b", std::to_string(i));
        wb.Merge("c:d", std::to_string(i));
    }
    db->Write(rocksdb::WriteOptions(), &wb);
    db->Flush(rocksdb::FlushOptions());

    rocksdb::ReadOptions read_options;
    rocksdb::Iterator *it = db->NewIterator(read_options);

    for (it->SeekToFirst(); it->Valid(); it->Next())
    {
        std::cout << it->key().ToString() << " --> " << it->value().ToString() << "\n";
    }
    delete it;
    delete db;
    std::filesystem::remove_all("/tmp/test");
    return 0;
}

bharat nc
  • 129
  • 2
  • 7

1 Answers1

0

Shared your question in the Speedb Hive, on Discord. The reply is from Hilik, our o-founder and chief scientist.

'Merge operators are very useful to get a quick write response time, alas reads require reading the original and applying by order the merges . This operation may be very expensive esp with strings that needed to be copied and appended on each merge. The simplest way to resolve this is to use read modify write eventually . Doing this at the application level is possible but may be problematic (if two threads can do this operation concurrently) . We are thinking of ways to resolve this during the compaction and are willing to work with you on a PR...'

Hope this helps. Join the discord server to participate in this discussion and many other interesting and related topics. Here is the link to the discussion about your topic

Dan Carfas
  • 46
  • 2