boost::ptree is taking to much memory during push_back and put_child

Question

datanode holds approximately data of 30MB when i push_back it to childnode then it takes memory of 200MB approximately and when I put this childnode into parent node again 200MB in this operation. I have stats all memory during this operation using perf stats. why push_back and put_child is taking so huge amount of memory?

used perf to see memory stats during push_back and put_child.

I don't know the exact details of the class or why it behaves so. But `boost::ptree` likely reserves a lot of nodes for future allocations. The strategy is necessary for hash maps. Operations like `reserve` might allow you to control the amount of allocated space better. — ALX23z, Aug 10 '23 at 20:33

score 1 · Answer 1 · answered Aug 16 '23 at 01:21

Yes, property tree is not very efficient. It's more versatile than efficient.

For example, each tree node can have both a value and child nodes (even if the respective backend doesn't, like JSON). It also allows for lookup by "key" but at the same time, multiple child nodes with the same name may exist. To complete the picture, children cannot be stored in an ordered map, because the order in which nodes are inserted may matter.

ptree uses a multi-index container with several indices to serve all these requirements. On the bright side, you get strong guarantees (like put having different semantics then add) and very flexible invalidation rules (like any node-based container: all references and iterators stay valid through all operations, including any rehashes/reallocation, unless they refer to elements removed).

Add to this the fact that ptree doesn't allow one to customize the allocator (even though boost::multi_index_container supports allocators), and a lack of move-awareness, and you see why ptree will never win any efficiency prizes.

What You Need

Guessing from the tags in your question you seem to need JSON support.

Firstly, let's note for once and for all that Boost Property Tree is NOT a JSON library¹.

Secondly, you're in luck. Boost 1.75 introduced Boost JSON! That not only IS a JSON library, but it even supports smart pool allocation, move semantics and in general allows highly efficient access, including streaming parsing/serialization of documents way bigger than would ever fit in memory.

See here for the documentation and examples: https://www.boost.org/doc/libs/1_83_0/libs/json/doc/html/json/quick_look.html

Also, if you can use C++14, note the examples in Boost Describe that use Boost JSON in ways that will knock the socks out of any abuse of Property Tree that I've seen. In fact, recent Boost JSON this directly using value_from/value_to with very little additional work on your part. E.g.:

Live On Compiler Explorer

#include <boost/describe.hpp>
#include <boost/json/src.hpp>
#include <iostream>
#include <optional>

namespace json = boost::json;

namespace MyLib {
    BOOST_DEFINE_ENUM_CLASS(Enum, foo, bar, qux)

    struct Base {
        Enum enumValue;
        double number;
    };
    BOOST_DESCRIBE_STRUCT(Base, (), (enumValue, number))

    struct Derived : Base {
        std::optional<std::string> maybeMessage;
    };

    BOOST_DESCRIBE_STRUCT(Derived, (Base), (maybeMessage))
} // namespace MyLib

template <> struct json::is_described_class<MyLib::Derived> : std::true_type {};

int main() {
    using MyLib::Enum;
    MyLib::Derived objs[] = {
        {{Enum::bar, 42e-1}, "Hello world"},
        {{Enum::qux, M_PI}, {}},
    };

    for (MyLib::Derived obj : objs) {
        std::cout << json::value_from(obj) << std::endl;
    }
}

Prints

{"enumValue":"bar","number":4.2E0,"maybeMessage":"Hello world"}
{"enumValue":"qux","number":3.141592653589793E0,"maybeMessage":null}

¹ (sorry for yelling, but decades of telling people this does that to you)

Hi Sehe, thanks for the quick response. Is there any page where the internal data structure used by boost::ptree is documented? It sounds very strange to me that a 30MB string, when put into a Ptree node, ends up consuming approximately 60MB. I'm having trouble correlating a simple put operation consuming double the memory. — Nachiketa Gupta, Aug 17 '23 at 11:41
I don't expect the implementation details to be documented. You have the full source code of course. You can always file a request for clarification as an issue. However, I still think you need a JSON library in reality. — sehe, Aug 17 '23 at 11:42

boost::ptree is taking to much memory during push_back and put_child

1 Answers1

What You Need