1

I am using strand to serialize certain processing within some objects. However, when the object dies, the strand element somehow refuses go away. Like a soul in purgatory, it manages to live in memory and causes memory usage to increase over days. I have managed to replicate the problem in a sample code.

I am creating 5 families, each with one parent and a child. The parent object contains the child and a strand object to ensure the processing happens in serial fashion. Each family is issued 3 processing tasks and they do it in the right order irrespective of the thread they run on. I am taking memory heap snapshot in VC++ before and after the object creation and processing. The snapshot comparison shows that the strand alone manages to live even after Parent and Child object have been destroyed.

How do I ensure strand object is destroyed? Unlike the sample program, my application runs for years without shutdown. I will be stuck with millions of zombie strand objects within a month.

#include <boost/thread.hpp>
#include <boost/shared_ptr.hpp>
#include <boost/asio/strand.hpp>
#include <boost/enable_shared_from_this.hpp>
#include <boost/noncopyable.hpp>
#include <boost/asio/io_service.hpp>
#include <boost/bind.hpp>
#include <boost/make_shared.hpp>
#include <boost/asio/yield.hpp>
#include <boost/log/attributes/current_thread_id.hpp>
#include <iostream>

boost::mutex mtx;

class Child : public boost::noncopyable, public boost::enable_shared_from_this<Child>
{
    int _id;
public:
    Child(int id) : _id(id) {}
    void process(int order)
    {
        boost::this_thread::sleep_for(boost::chrono::seconds(2));
        boost::lock_guard<boost::mutex> lock(mtx);
        std::cout << "Family " << _id << " processing order " << order << " in thread " << std::hex << boost::this_thread::get_id() << std::endl;
    }
};

class Parent : public boost::noncopyable, public boost::enable_shared_from_this<Parent>
{
    boost::shared_ptr<Child> _child;
    boost::asio::io_service::strand _strand;

public:
    Parent(boost::asio::io_service& ioS, int id) : _strand(ioS)
    {
        _child = boost::make_shared<Child>(id);
    }

    void process()
    {
        for (int order = 1; order <= 3; order++)
        {
            _strand.post(boost::bind(&Child::process, _child, order));
        }
    }
};

int main(int argc, char* argv[])
{
    boost::asio::io_service ioS;
    boost::thread_group threadPool;
    boost::asio::io_service::work work(ioS);

    int noOfCores = boost::thread::hardware_concurrency();
    for (int i = 0; i < noOfCores; i++)
    {
        threadPool.create_thread(boost::bind(&boost::asio::io_service::run, &ioS));
    }

    std::cout << "Take the first snapshot" << std::endl;
    boost::this_thread::sleep_for(boost::chrono::seconds(10));

    std::cout << "Creating families" << std::endl;
    for (int family = 1; family <= 5; family++)
    {
        auto obj = boost::make_shared<Parent>(ioS,family);
        obj->process();
    }
    std::cout << "Take the second snapshot after all orders are processed" << std::endl;

    boost::this_thread::sleep_for(boost::chrono::seconds(60));
    return 0;
}

The output looks like this:

Take the first snapshot
Creating families
Take the second snapshot after all orders are processed
Family 3 processing order 1 in thread 50c8
Family 1 processing order 1 in thread 5e38
Family 4 processing order 1 in thread a0c
Family 5 processing order 1 in thread 47e8
Family 2 processing order 1 in thread 5f94
Family 3 processing order 2 in thread 46ac
Family 2 processing order 2 in thread 47e8
Family 5 processing order 2 in thread a0c
Family 1 processing order 2 in thread 50c8
Family 4 processing order 2 in thread 5e38
Family 2 processing order 3 in thread 47e8
Family 4 processing order 3 in thread 5e38
Family 1 processing order 3 in thread 50c8
Family 5 processing order 3 in thread a0c
Family 3 processing order 3 in thread 46ac

I took the first heap snapshot before creating families. I took the second snapshot few seconds after all the 15 lines were printed (5 familes X 3 tasks). The heap comparison shows the following:

Memory heap comparison

All the Parent and Child objects have gone away, but all the 5 strand objects lives on...

Edit: For those who don't understand shared_ptr, the objects don't die at the end of the loop. Since the reference of the child has been passed to 3 process tasks, at least child live a charmed life until all the tasks are completed for a given family. Once all references are cleared, Child object will die.

Sharath
  • 1,627
  • 2
  • 18
  • 34
  • I'm fairly certain you have a race condition, possibly leading to undefined behavior. The Parent object is destroyed at the end of the loop where you create it. If the thread hasn't executed process by that point, the io_service is going to have a dangling reference to the strand. – Dave S Mar 25 '21 at 11:45
  • The parent object don't die at the end of the loop. It happens when all the tasks are completed much later. I have provided a full sample that can be compiled and run. The io_service object will be running for years. How do I cut the cord between io_service and temporary strand object here? – Sharath Mar 25 '21 at 12:01

1 Answers1

2

That's by design. A strand uses a pool of mutexes which live until the io_service dies.

But the pool size is limited currently to 193.

So you should not see more than 193 of those hanging around. See also issue #259.

rustyx
  • 80,671
  • 25
  • 200
  • 267
  • Note that by mixing `strand_executor_service` and `strand_service` it is currently possible to go over 193 strand implementations (2x193 by default, but unlimited when you override the default for the "legacy" strand service: https://stackoverflow.com/questions/66765121/asiostrandasioio-contextexecutor-type-vs-io-contextstrand/66772844#66772844) – sehe Mar 25 '21 at 12:52
  • For full-duplex sessions, do you recommend using a session-specific mutex over strands? AFAICT that's certainly not idiomatic, but I might consider it in the future. – sehe Mar 25 '21 at 12:54
  • You are right. I changed the number of families to 1000, and the zombie strands did not exceed 193. So it is safe to leave it like that? My application may create more than a million object per month. Should I go back to mutex locking like before? I switched to strand to avoid mutex locking in a callback function that must return very fast. – Sharath Mar 25 '21 at 13:26
  • @sehe ah right, I must be generalizing based on personal experience :) yes, for full-duplex writes a strand is useful, I just don't like session-level stuff that lingers... so I have a `atomic writing` flag in combination with a queue. Anyway let me remove my personal opinion from there to avoid confusion... – rustyx Mar 25 '21 at 13:47
  • 1
    @Sharath a strand will internally lock a mutex, so it shouldn't matter much performance-wise. But I would recommend checking your actual application for leaks after a stress test. – rustyx Mar 25 '21 at 13:50