2

I've recently discovered the power of the VTD-XML approach to XML parsing, mainly its speed. Just to be specific, I have built the C version 2.10 ( there are Java, C++ and C# implementations too ).

My objective is simple: I want to extract data from XML using VTD-XML for parsing, and using Perl to work with data. The easy way may be dump data with a C program I made, and send them via pipe to the Perl program. Maybe not elegant but it works.

Another, less easy way, consists of a Perl program that calls the C data collector subroutine using Inline::C.

So I started studying Inline::C and managed to do basic things I need to pass data back to Perl from C subroutines using Perl C API functions. Problems arise in the compiling phase when I write the C collector subroutine in the C source under Inline::C control.

There are symbol conflicts like this: bind() is defined both in socket.h ( Perl ) and in autoPilot.h ( VTD-XML ). Symbol conflicts can be avoided building VTD-XML as a shared library with an explicit export map ( gcc -Wl,-version-script=foo.map )... Is this the right way to go? Are there better ways?

Marco De Lellis
  • 1,169
  • 6
  • 10

1 Answers1

4

I did reach my goal by adding a layer of indirection: awful, as it seems to me it works.

First of all, I made a shared library containing the VTD-XML API. Building this shared object, I had to avoid global scope pollution, exporting only symbols needed.

Then I built another shared library. This second shared libray hides the VTD-XML API and is supposed to be used from Perl via Inline::C. In this shared object I wrote a handful of functions, using libvtd.so partially exposed API.

The idea looks like this:

Perl -> Inline::C dynamic loader -> wrapper_API.so -> libvtd.so 

Major issues came from runtime loading of shared libraries and from symbol collision/resolution.

Here is how I build libvtd.so, making it easy for the so called wrapper_API.so to use it.

Unfortunately, VTD-XML doesn't build a libvtd.so shared object, so I had to build it myself linking together several .o object files with gcc:

gcc -shared -fPIC -Wl,-soname,libvtd.so.2.10 -Wl,--version-script=vtd-xml.map \
-o libvtd.so.2.10 libvtd.o arrayList.o fastIntBuffer.o fastLongBuffer.o \
contextBuffer.o vtdNav.o vtdGen.o autoPilot.o XMLChar.o XMLModifier.o intHash.o \
 bookMark.o indexHandler.o transcoder.o elementFragmentNs.o

Symbol visibility was tuned with the linker option -Wl,--version-script=vtd-xml.map, where the map file being:

{
    global:
        the_exception_context;
        toString;
        getText;
        getCurrentIndex;
        toNormalizedString;
        toElement;
        toElement2;
        createVTDGen;
        setDoc;    
        parse;
        getNav;
        freeVTDGen;
        freeVTDNav;
        getTokenCount;
    local:
        *;  
};

Global ( "exported" ) symbols are under the global: section, while the catchall * under local says all other symbols are only known locally.

All object modules come from the VTD-XML distribution, with the exception of libvtd.o: this custom object was needed to address issues with exception handling library cexept.h. libvtd.c is only two lines of code.

#include "customTypes.h"
struct exception_context the_exception_context[ 1 ];

In the compilation phase I had to adjust CFLAGS of to make Position Independent Code ( gcc -fPIC option ), in order to make shared objects.

readelf tool was useful to check symbol visibility:

readelf --syms libvtd.so.2.10

Symbol table '.dynsym' contains 35 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
   ...
   280: 000000000000d010   117 FUNC    LOCAL  DEFAULT   12 writeIndex
   281: 000000000003c5d0   154 FUNC    LOCAL  DEFAULT   12 setCursorPosition
   282: 000000000003c1f0    56 FUNC    LOCAL  DEFAULT   12 resetIntHash
   ...
   331: 0000000000004f50  3545 FUNC    GLOBAL DEFAULT   12 toElement
   332: 00000000000071e0   224 FUNC    GLOBAL DEFAULT   12 getText
   333: 000000000000d420   114 FUNC    GLOBAL DEFAULT   12 freeVTDGen
   ...
   339: 000000000000b600   731 FUNC    GLOBAL DEFAULT   12 toElement2
   340: 000000000000e650   120 FUNC    GLOBAL DEFAULT   12 getNav
   341: 0000000000025750 70567 FUNC    GLOBAL DEFAULT   12 parse

The wrapperAPI.so consists of several functions that use VTD-XML API, its custom types, but accept and return only standard C types and/or structs. The wrapper came straight from a former standalone C program.

Marco De Lellis
  • 1,169
  • 6
  • 10