3

When using the boost regex class with the optional ICU support enabled (see boost documentation for details) I seem to get a memory leak or rather some sort of caching of memory happening which I cannot seem to reset / cleanup.

Has anyone else seen this and maybe knows of a way of clearing the cache so that the boost unit test framework will not report a memory leak?

The details for my problem are :-

ICU version 4.6.0
(Built using supplied vs2010 solution in debug and release configuration)
Boost version 1.45
(built with command "bjam variant=debug,release threading=multi link=shared stage" since standard distribution does not include icu support in regex)
OS Windows 7
Compiler MSVC 10 (Visual Studio 2010 Premium)

Though I did try this with a boost 1.42 with icu 4.2.1 which I happened to have built on my system with same results so don't think its a problem which would be solved by changing to boost 1.47 icu 4.8.1 which are the latest versions.

Compiling the following code (Test.cpp) :-

#define BOOST_TEST_MAIN    //Ask boost unit test framework to create a main for us
#define BOOST_ALL_DYN_LINK //Ask boost to link to dynamic library rather than purely header support where appropriate
#include <boost/test/auto_unit_test.hpp>

#include <boost/regex.hpp>
#include <boost/regex/icu.hpp> //We use icu extensions to regex to support unicode searches on utf-8
#include <unicode/uclean.h>    //We want to be able to clean up ICU cached objects

BOOST_AUTO_TEST_CASE( standard_regex ) 
{
    boost::regex re( "\\d{3}");
}

BOOST_AUTO_TEST_CASE( u32_regex ) 
{
    boost::u32regex re( boost::make_u32regex("\\d{3}"));
    u_cleanup(); //Ask the ICU library to clean up any cached memory
}

Which can be compiled from a command line by:-

C:\>cl test.cpp /I[BOOST HEADERS PATH] /I[ICU HEADERS] /EHsc /MDd -link /LIBPATH:[BOOST LIB PATH] [ICU LIB PATH]icuuc.lib

With the appropriate paths to headers / libs for your machine

Copy the appropriate boost dlls to the directory containing test.exe if they are not pathed in (boost_regex-vc100-mt-gd-1_45.dll and boost_unit_test_framework-vc100-mt-gd-1_45.dll)

When test.exe from above steps is run I get :-

Running 2 test cases...

*** No errors detected
Detected memory leaks!
Dumping objects ->
{789} normal block at 0x00410E88, 28 bytes long.
 Data: <    0N U        > 00 00 00 00 30 4E CD 55 00 00 00 00 01 00 00 00
{788} normal block at 0x00416350, 14 bytes long.
 Data: <icudt46l-coll > 69 63 75 64 74 34 36 6C 2D 63 6F 6C 6C 00
{787} normal block at 0x00415A58, 5 bytes long.
 Data: <root > 72 6F 6F 74 00
...lots of other blocks removed for clarity ...

I'm guessing that icu is actually the culprit here since there it has its name at the start of the 2nd block.

Just doing the 1st test (ie just creating a standard regex not a u32_regex) has no memory leaks detected.

Adding multiple u32_regex's to the test does not result in more memory being leaked.

I attempted to clean up the icu cache by using the u_cleanup() call as per the icu documentation see the ICU Initialization and Termination section.

However I am not very familiar with the icu library (actually am only using it because we wanted unicode aware regex support) and can't see how to get the u_cleanup() call to actually clean up the data when ICU is being loaded by the boost regex dll.

Just to reiterate the problem appears to be :-

boost regex in a dll compiled with optional icu support (I'm pretty sure this uses a static link to icu but may be wrong here)

If I link to icuuc.lib in test program so that I can call u_cleanup() this doesn't appear to affect the memory held by the instance of ICU loaded via the boost regex library (well it would be rather odd if it did)

I can't find any calls in regex library which allow me to ask it to cleanup the ICU data which is really where we want to make the call.

Alex Perry
  • 335
  • 3
  • 9

2 Answers2

1

u_cleanup is what cleans up the data, however it can't clean up the data if any items are still open.

Can you try not calling any boost function, but just calling u_cleanup() and see if there are any leaks? And then try just calling u_init() and then u_cleanup()

I'm not familiar with Boost to know if the above code will cleanup the regex, or if boost has any internal caching. The leaked objects don't look like usual ICU data, if ICU's data was still open you would see quite a bit of data, not 14+5 bytes

Steven R. Loomis
  • 4,228
  • 28
  • 39
  • Thanks - that was only the start of the leaked data - I've edited the question which hopefully makes that clearer. – Alex Perry Jul 29 '11 at 02:12
  • Further investigation has shown that icu is actually dynamic linked not static linked. Boost regex appears to degrade gracefully if icu dlls not found which is why deleting icu dlls doesn't generate dll not found message (which was my simple test to see if dynamic or static linked). Hence u_cleanup() should work (I think) – Alex Perry Aug 01 '11 at 07:49
  • 1
    If you built ICU with preprocessor flag UCLN_NO_AUTO_CLEANUP defined to be 0 ( you could add a #define at the top of uconfig.h ) then ICU will clean itself up as the DLL is unloaded. u_cleanup is safe to call if ICU is already cleaned up or never loaded. – Steven R. Loomis Aug 02 '11 at 16:06
1

Just thought that I may as well answer the question here since I did solve this (with help from boost users).

The problem is in the order of tear down - if static objects in the boost regex dll are not destructed before the unit test framework then this will still be cacheing some data. And so the UTF reports memory leaks. Simply calling u_cleanup() isn't sufficient.

The easiest way of ensuring the order is to link with the unit test framework as a static library - this then gets its objects destructed after any dlls and so doesn't report the cached objects as a memory leak since they are already destructed.

Alex Perry
  • 335
  • 3
  • 9