2

I have used pandoc with the option --self-contained to create HTML documents where images are embedded in the HTML code as base64.

The image is included in the IMG tag like this (where I have replaced the long string of base64-characters with a placeholder: <IMG src="data:image/png;base64,<<base64-coded characters here>>" width=672">

Now, I'd like to extract such images, i.e. do the reverse where base64-coded data are replaced by references to files and the data converted to ordinary PNG or JPEG files that are saved on disk.

I was hoping to use pandoc to do this conversion, but I could not find an option for this in pandoc, nor have I found any other software that does it. Ideally, the solution should be shell/script-type that can easily be included in a longer toolchain.

torkildl
  • 168
  • 8
  • I'd suggest finding a different workflow. base64 is really quite inefficient for larger files.. and as you discovered, not a lot of tools handle it when doing document conversions... – mb21 Aug 03 '20 at 11:46

1 Answers1

3

You can use pandoc with the --extract-media option. The images will be written to the supplied directory and the base64 URLs will be replaced with references to those files.

E.g.

pandoc --from=html YOUR_FILE.html --extract-media=images
tarleb
  • 19,863
  • 4
  • 51
  • 80