EPUB loading with MLCP

Question

MarkLogic does not 'handle' EPUB. CPF does not. MLCP does not.

EPUB is a zip containing mainly xhtml, xml and pictures. I can rename it to .zip and load it with MLCP. But renaming is not so nice, it will show up in the URI unless I add a replace to the URI creation etc. etc.

Also, the .opf file contains useful information, it is XML but read as binary. I can add .OPF to the MIME-types but this does not work in combination with loading from archive with MLCP, then it will still show up as Binary again.

I'd hate to add an extra layer 'preparing' the data before it loads into ML. And I would like to keep the information readable/indexable as much as I can.

Is there a better way than; renaming, unpacking and mime-typing to load EPUB files into MarkLogic?

score 3 · Answer 1 · answered Sep 13 '16 at 07:44

3

I think I'd personally use an MLCP transform. You know it is zipped data, so you can safely apply xdmp:zip-manifest yourself inside the transform. You can emit multiple map:map objects in the transform, with uri/value for each part in the epub zip. You can use options to tell xdmp:zip-get to read a particular file with a specific format.

HTH!

answered Sep 13 '16 at 07:44

grtjn

20,254
1
24
35

I'm playing around with this, works great but I don't use maps because I'm not sure it can hold a binary (JPG) from the EPUB file. – Thijs Sep 15 '16 at 13:19

score 1 · Answer 2 · answered Sep 14 '16 at 04:19

1

Do you want to load the .opf file into the database as a single document, or do you want mlcp to unpack it for you and insert the contained XHTML, XML, and image files into the database as individual documents?

If the latter, you might be able to achieve it (without renaming your source file) by using the -input_compression_codec option. See this topic in the documentation:

http://docs.marklogic.com/guide/mlcp/import#id_13251

answered Sep 14 '16 at 04:19

kcoleman

666
3
8

Interesting! Thanks. I will probably go with the `transform` approach grtjn mentioned, this way I also load the EPUB file as-is and all the work is done server-side. – Thijs Sep 14 '16 at 08:09

EPUB loading with MLCP

2 Answers2