0

I want to pipe the output from bzip2 and use it as an input to fill a TDB database using tbdloader2 from apache-jena-3.9.0.

I already found Generating TDB Dataset from archive containing N-TRIPLES files but the proposed solution there did not work for me.

bzip2 -dc test.ttl.bz2 | tdbloader2 --loc=/pathto/TDBdatabase_test -- -

produces

20:08:01 INFO -- TDB Bulk Loader Start
20:08:01 INFO Data Load Phase
20:08:01 INFO Got 1 data files to load
20:08:01 INFO Data file 1: /home/user/-
File does not exist: /home/user/-
20:08:01 ERROR Failed during data phase

Similar results I got with with (inspired by https://unix.stackexchange.com/questions/16990/using-data-read-from-a-pipe-instead-than-from-a-file-in-command-options)

bzip2 -dc test.ttl.bz2 | tdbloader2 --loc=/pathto/TDBdatabase_test /dev/stdin 
20:34:45 INFO -- TDB Bulk Loader Start
20:34:45 INFO Data Load Phase
20:34:45 INFO Got 1 data files to load
20:34:45 INFO Data file 1: /proc/16256/fd/pipe:[92062]
File does not exist: /proc/16256/fd/pipe:[92062]
20:34:45 ERROR Failed during data phase

and

bzip2 -dc test.ttl.bz2 | tdbloader2 --loc=/pathto/TDBdatabase_test /dev/fd/0 
20:34:52 INFO -- TDB Bulk Loader Start
20:34:52 INFO Data Load Phase
20:34:52 INFO Got 1 data files to load
20:34:52 INFO Data file 1: /proc/16312/fd/pipe:[97432]
File does not exist: /proc/16312/fd/pipe:[97432]
20:34:52 ERROR Failed during data phase   

unpacking the bz2 file manually and then adding it works fine:

bzip2 -d test.ttl.bz2
tdbloader2 --loc=/pathto/TDBdatabase_test test.ttl

Would be great if someone could point me in the right direction.

markus
  • 3
  • 1
  • 1
    Your question is better suited to [Super User](http://superuser.com/tour). [Stack Overflow is a question and answer site for professional and enthusiast programmers](http://stackoverflow.com/tour). – Cyrus Nov 25 '18 at 15:12

1 Answers1

1

tdbloader2 accepts bz2 compressed files on the command line:

tdbloader2 --loc=/pathto/TDBdatabase_test test.ttl.bz2

It doesn't accept input from a pipe - and if it did, then it would not know the syntax is Turtle which it gets from the file extension.

AndyS
  • 16,345
  • 17
  • 21
  • thank you. works. I didn't think about it as feeding zipped files didn't work when using fuseki. – markus Nov 26 '18 at 03:29
  • bz2 and gz are a single file compression tools; zip is an archive where the individual entries have separate names. c.f. tar The names matter for reading files to get the syntax. zip files could be supported; it is unpacking and reading each file but the base URI is going to be weird. (yes, gz can have several files, but it isn't an archive as such.) – AndyS Nov 26 '18 at 09:40