I am using Boehm-GC in my C program for garbage collection. I am trying to parallelize a for loop which works on an array. The array is allocated through GC_malloc. When the loop is done executing, the array is not used anymore in the program. I call GC_gcollect_and_unmap which frees the array. However when I parallelize the for loop using openmp, the array is never freed after the loop is done executing. It is the exact same program, I only add #pragmas around the loop to parallelize it. I have tried looking at the assembly code side by side with and without openmp parallelization, I see that the array pointer is being handled in a similar way and don't see extra pointers being kept anywhere. The only difference is that the for loop is implemented as a simple loop within the main function but when I parallelize it, openmp creates a new function ##name##._omp_fn and calls it. Anyhow, is there something I need to do so that the Boehm-GC collects the array? It is hard for me to post an MWE because if the program is small enough, Boehm-GC doesn't kick in at all.
Here is a code excerpt without parallelization.
struct thing {
float* arr;
int size;
}
int l=10;
static thing* get_randn(void) {
thing* object = (thing*)GC_malloc(sizeof(struct {float* arr, int size}));
object->arr=malloc(sizeof(float)*l);
void finalizer(void *obj, void* client_data)
{
printf("freeing %p\n", obj);
thing* object = (thing*)obj;
free(object->arr);
}
GC_register_finalizer(object, &finalizer, NULL, NULL, NULL);
float *arr = object->arr;
int t_id;
for (t_id = 0; t_id<l; t_id++) {
torch_randn(arr+t_id);
}
return object;
}
The above code garbage collects the object produced by the function. Following is the code with parallelization.
struct thing {
float* arr;
int size;
}
int l=10;
static thing* get_randn(void) {
thing* object = (thing*)GC_malloc(sizeof(struct {float* arr, int size}));
object->arr=malloc(sizeof(float)*l);
void finalizer(void *obj, void* client_data)
{
printf("freeing %p\n", obj);
thing* object = (thing*)obj;
free(object->arr);
}
GC_register_finalizer(object, &finalizer, NULL, NULL, NULL);
float *arr = object->arr;
int t_id;
#pragma omp parallel num_threads(10)
{
#pragma omp for
for (t_id = 0; t_id<l; t_id++) {
torch_randn(arr+t_id);
}
}
return object;
}
For this code, object does not get garbage collected. It is difficult to reproduce the problem just by itself through an MWE because garbage collector doesn't kick in for small programs, but I am observing this behavior when I run with my full program.