1

UPDATE: I've found a partial workaround. See bottom of this post.

After a number of hours of debugging a program, I've found that there is some kind of conflict between the netCDF and HDF5 libraries (the program reads/writes files of both formats).

I've boiled down the code to a tiny program that shows the issue. This program segfaults:

#include <iostream>
#include <string>

#include "H5Cpp.h"

#include <netcdf>

using namespace std;

void stupidfunction() // Note that this is never called.
{
    H5::Group grp1; // The mere potential existence of this makes netcdf segfault!
}

int main(int argn, char ** args)
{
    std::string outputFilename = "/tmp/test.nc";

    try
    {   
        std::cout << "Now opening " << outputFilename << std::endl;
        netCDF::NcFile sfc;
        sfc.open(outputFilename, netCDF::NcFile::replace);

        std::cout << "closing file" << std::endl;
        sfc.close();

        return true;
    }
    catch(netCDF::exceptions::NcException& e)
    {
        std::cout << "EX: " << e.what() << std::endl;
        return false;
    }
    
    return 0;
}

(Compile command: h5c++ test.cpp -std=gnu++11 -O0 -g3 -lnetcdf_c++4 -lnetcdf -o test)

My installed (relevant) packages:

libnetcdf-c++4-1                       4.3.1-2build1                         amd64        C++ interface for scientific data access to large binary data
libnetcdf-c++4-dev                     4.3.1-2build1                         amd64        creation, access, and sharing of scientific data in C++
libnetcdf-dev                          1:4.7.3-1                             amd64        creation, access, and sharing of scientific data
libnetcdf15:amd64                      1:4.7.3-1                             amd64        Interface for scientific data access to large binary data
netcdf-bin                             1:4.7.3-1                             amd64        Programs for reading and writing NetCDF files
netcdf-doc                             1:4.7.3-1                             all          Documentation for NetCDF

hdf5-helpers                           1.10.4+repack-11ubuntu1               amd64        Hierarchical Data Format 5 (HDF5) - Helper tools
hdf5-tools                             1.10.4+repack-11ubuntu1               amd64        Hierarchical Data Format 5 (HDF5) - Runtime tools
libhdf4-0                              4.2.14-1ubuntu1                       amd64        Hierarchical Data Format library (embedded NetCDF)
libhdf5-103:amd64                      1.10.4+repack-11ubuntu1               amd64        Hierarchical Data Format 5 (HDF5) - runtime files - serial version
libhdf5-cpp-103:amd64                  1.10.4+repack-11ubuntu1               amd64        Hierarchical Data Format 5 (HDF5) - C++ libraries
libhdf5-dev                            1.10.4+repack-11ubuntu1               amd64        Hierarchical Data Format 5 (HDF5) - development files - serial versio

What is going on here that makes it segfault? Is there anything I can do to avoid or fix this problem? Any help is appreciated!

When run using gdb:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7a47a01 in __vfprintf_internal (s=s@entry=0x7fffff7ff480, 
    format=format@entry=0x7ffff77e13a8 "can't locate ID", 
    ap=ap@entry=0x7fffff7ff5e0, mode_flags=mode_flags@entry=2)
    at vfprintf-internal.c:1289
1289    vfprintf-internal.c: No such file or directory.

gdb backtrace:

(gdb) bt
#0  0x00007ffff7a47a01 in __vfprintf_internal (s=s@entry=0x7fffff7ff480, format=format@entry=0x7ffff77e13a8 "can't locate ID", 
    ap=ap@entry=0x7fffff7ff5e0, mode_flags=mode_flags@entry=2) at vfprintf-internal.c:1289
#1  0x00007ffff7a5cd4a in __vasprintf_internal (result_ptr=0x7fffff7ff5d8, format=0x7ffff77e13a8 "can't locate ID", args=0x7fffff7ff5e0, mode_flags=2)
    at vasprintf.c:57
#2  0x00007ffff75c7e56 in H5E_printf_stack () from /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103
#3  0x00007ffff76553b9 in H5I_inc_ref () from /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103
...
many many lines repeating H5E_printf_stack, H5E__push_stack and H5I_inc_ref
...
#56139 0x00007ffff76553b9 in H5I_inc_ref () from /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103
#56140 0x00007ffff75c7c2f in H5E__push_stack () from /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103
#56141 0x00007ffff75c7e7e in H5E_printf_stack () from /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103
#56142 0x00007ffff761fc85 in H5G_loc () from /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103
#56143 0x00007ffff7546903 in H5Acreate1 () from /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103
#56144 0x00007ffff790b11b in NC4_write_provenance () from /usr/lib/x86_64-linux-gnu/libnetcdf.so.15
#56145 0x00007ffff790b5a8 in ?? () from /usr/lib/x86_64-linux-gnu/libnetcdf.so.15
#56146 0x00007ffff790b7b0 in nc4_close_hdf5_file () from /usr/lib/x86_64-linux-gnu/libnetcdf.so.15
#56147 0x00007ffff790b9ea in NC4_close () from /usr/lib/x86_64-linux-gnu/libnetcdf.so.15
#56148 0x00007ffff78ca579 in nc_close () from /usr/lib/x86_64-linux-gnu/libnetcdf.so.15
#56149 0x00007ffff7f82270 in netCDF::NcFile::close() () from /usr/lib/x86_64-linux-gnu/libnetcdf_c++4.so.1
#56150 0x00005555555a7959 in main (argn=1, args=0x7fffffffe5b8) at test.cpp:29
(gdb) 

Partial workaround: If I specify the netCDF file format as classic or classic64, the error does not occur. i.e:

sfc.open(outputFilename, netCDF::NcFile::replace, netCDF::NcFile::classic);

or

sfc.open(outputFilename, netCDF::NcFile::replace, netCDF::NcFile::classic64);
  • Seems you have to separate those libs usages in different compilation units, so no both "#includes" are set in the same unit. – Ripi2 Aug 27 '21 at 17:58
  • @Ripi2 : In the larger program, they are in different files, which are compiled (to object files) separately. But in the end, they are linked into one binary executable. Separating the software into two separate executables should get rid of the error, but is hardly an elegant solution... (and the software design had not envisioned splitting into several executables... so it would require big changes) – Knut Stanley Jacobsen Aug 27 '21 at 18:05
  • Small thoughts: a) I don't see 'hdfs' lib param in the compiler command. b) Some times changing the order of the "#includes" avoid the issue (although may create new ones). c) What is the err msg if you run it under a debugger? – Ripi2 Aug 27 '21 at 18:14
  • The weird thing is that both libs use their namespaces, should not conflict. Now I wonder why you "#include" one with `<>` and the other with `""`. And if you have the proper libs-search in your link command. – Ripi2 Aug 27 '21 at 18:24
  • @Ripi2 : a) Note that it is compiled using the command "h5c++" (not "g++"). This is a script that among other things add the library options for hdf5 (see https://www.mankier.com/1/h5c++). b) I've tried changing the order of the includes. It had no effect. – Knut Stanley Jacobsen Aug 28 '21 at 17:51
  • @Ripi2 : "The weird thing is that both libs use their namespaces, should not conflict. Now I wonder why you "#include" one with <> and the other with "". And if you have the proper libs-search in your link command." Using "" or <> to include affects the order of search for includes. If there is only one instance of the file to be included, it has no effect on the end result. For completeness, I tested using <> for both. It had no effect. It is definitely weird that they manage to have a conflict, but it might be related to underlying C functions/variables/macros? – Knut Stanley Jacobsen Aug 28 '21 at 17:55
  • I've found a partial workaround. It doesn't fix the problem, but is good enough for my situation at the moment. Will update OP with the workaround. – Knut Stanley Jacobsen Aug 28 '21 at 18:01
  • Nice you found it. However, it seems an issue coming from one (or both) libs. I'd mail to their creators. – Ripi2 Aug 28 '21 at 18:08
  • Seeing something rather similar in a C program of mine (reads netCDF writes HDF5), any update on this? – jjg Nov 21 '22 at 21:49

1 Answers1

0

I have a C program which was showing similar behaviour, I found that adding

#include <H5public.h>

  : 

  if (H5dont_atexit() < 0)
    {
      fprintf(stderr, "failed HDF5 don't-atexit\n");
      return 1;
    }

at the start of main() fixes the issue. That does mean that files that you H5Fopen() don't get automatically H5Fclose()-ed, but possibly a lower-impact workaround.

jjg
  • 907
  • 8
  • 18