0

I have an image containing ITPC data and use the following command to extract the IPTC as textual data:

convert image.jpg IPTCTEXT:iptc.txt

The problem is that this seems to be using entities for "special characters":

2#120#Caption="Beschreibung für den Import aus IPTC"

Actually it should be "für" here. But instead of getting the correct entity ü for the "ü" character i get two entities (probably both bytes of the UTF-8 encoded character got transformed to entites separated). And these two entites i cannot parse correctly.

Is there any way to get the correct entity or disable the entities completely returning UTF-8 characters?

Edit: I tried parsing the entities using StringEscapeUtils.unescapeXml in Java but i get two characters ("ü") instead of the "ü" as both entities are unescaped separated.

Edit2: Example image here: http://fs1.directupload.net/images/150615/5eiv6wwf.jpg

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
Werzi2001
  • 2,035
  • 1
  • 18
  • 41

2 Answers2

1

The most reliable metadata package is IMHO exiv2 (http://exiv2.org/; available in all Linux distros, Windows, and not sure about Mac binaries).

See http://paste.fedoraproject.org/232538/34459066/ for results. ImageMagick is for metadata purposes not that great, I am afraid.

mcepl
  • 2,688
  • 1
  • 23
  • 38
  • The easiest way to get `exiv2` on a Mac is with `homebrew`. Install `homebrew` by copying and pasting the one-liner on http://brew.sh and then run `brew install exiv2`. Job done! – Mark Setchell Jun 16 '15 at 12:58
0

I am not sure why you are seeing something different from me. I am running ImageMagick 6.9.1-4 on a Mac under OS X.

If I do this:

identify -format "%[IPTC:2:120]" http://fs1.directupload.net/images/150615/5eiv6wwf.jpg

I get this:

Beschreibung für den Import aus EXIF

enter image description here

And if I hex dump that, I get this:

enter image description here

I think it may be related to your Terminal's locale settings - although I don't know why it still happens when you redirect to a file. Have you tried things like:

LC_CTYPE=C identify -format "%[IPTC:2:120]" http://fs1.directupload.net/images/150615/5eiv6wwf.jpg | od -xc
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • Using the -format parameter i also get the correct result. The problem is that when retrieving all IPTC data using IPTCTEXT the result is incorrect (at least it seems incorrect for me). – Werzi2001 Jun 16 '15 at 09:52