0

I have JPG image with XMP meta data inside.
I'd like to read this data, but how?

$content = file_get_contents($fileName);
var_dump($content);

displays real number of bytes 553700 but

$len = strlen($content);
var_dump($len);

displays 373821

So, I can't simple do

$xmpStart = strpos($content, '<x:xmpmeta');

because I get wrong offset. So, the question is, how to find and read string from binary file in PHP? (I have mb_string option ON in php.ini)

UPD1:

I have some binary file. How can I check in PHP, this file contains several strings or not?

Lari13
  • 1,850
  • 10
  • 28
  • 55
  • Ah, it's clearer now. Essentially, it shouldn't matter what kind of data is being used. Can you try whether `strlen($content, "iso-8859-1")` gives the correct value? – Pekka Sep 30 '11 at 12:25
  • `$pos = strpos($content, ' – Lari13 Sep 30 '11 at 19:28

3 Answers3

1

Getid3 is a PHP package that claims to be able to read XMP Metadata.

Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • Ok, thanks. It can help, but if I have to read own strings from binary files? How can it be done? – Lari13 Sep 30 '11 at 08:43
  • @Lari you'd have to build your own JPG parser. While surely very interesting, it's likely to be a huge task. – Pekka Sep 30 '11 at 08:45
  • Let's forget JPG. :) I have some binary file. How can I check in PHP, this file contains several strings or not? – Lari13 Sep 30 '11 at 09:32
1

Essentially, it doesn't matter what kind of data you are reading - strlen() et al. should always work.

What I think is happening here is that on your server, strlen() is internally overridden by mb_strlen() and the internal character encoding is set to UTF-8.

UTF-8 is a multi-byte encoding, so some of the characters in your (wildly arbitrary) byte stream get interpreated as multi-byte characters - resulting in a shortened length of 373821 instead of 553700.

I can't think of a better workaround than always explicitly specifying a single-byte encoding like iso-8859-1:

 $pos = strpos($content, '<x:xmpmeta', 0, 'iso-8859-1');

this forces strpos() (or rather, mb_strpos()) to count every single byte in the data.

This will always work; I do not know whether there is a more elegant way to force the use of a single-byte encoding.

Pekka
  • 442,112
  • 142
  • 972
  • 1,088
0

The exif_read_data() PHP function could help the XMP meta data

More info here: http://php.net/manual/en/function.exif-read-data.php

Sheitan
  • 141
  • 1
  • 7