0

I have a question about IPTC metadata. Is it possible to search images that aren't in a database by their IPTC metadata (keywords) and show them and how would I go about doing this? I just need a basic idea.

I know there is the iptcparse() function for PHP.

I have already written a function to grab the image name, location, and extension for all images within a galleries folder and all subdirectories by .jpg extension.

I need to figure out how to extract the metadata without storing it in a database and how to search through it, grab the relevant images that match the search tag (their IPTC keywords should match) and how to display them. I know at the point that I have the final results (post search) i can echo an imagetag with src="$filelocation"> if i have the final results in an array.

Basically, I am not sure if I need to store all my images into a mysql database and also extract the keywords and store them in the database as well before I can actually search and display the results. Also, if you could guide me to any gallery that already is able to do this, that could help as well.

Thanks for any help regarding this issue.

fragmentedreality
  • 1,287
  • 9
  • 31

2 Answers2

3

It is not clear what in particular is giving you problems, but perhaps this will give you some ideas:

<?php
# Images we're searching
$images = array('/path/to/image.jpg', 'another-image.jpg');

# IPTC keywords to values (from exiv2, see below)
$query = array('Byline' => 'Some Author');

# Perform the search
$result = select_jpgs_by_iptc_fields($images, $query);

# Display the results
foreach ($result as $path) {
    echo '<img src="', htmlspecialchars($path), '">';
}

function select_jpgs_by_iptc_fields($jpgs, $query) {
    $matches = array();
    foreach ($jpgs as $path) {
        $iptc = get_jpg_iptc_metadata($path);
        foreach ($query as $name => $values) {
            if (!is_array($values))
                $values = array($values);
            if (count(array_intersect($iptc[$name], $values)) != count($values))
                continue 2;
        }
        $matches[] = $path;
    }
    return $matches;
}

function get_jpg_iptc_metadata($path) {
    $size = getimagesize($path, $info);
    if(isset($info['APP13']))
    {
        return human_readable_iptc(iptcparse($info['APP13']));
    }
    else {
        return null;
    }
}

function human_readable_iptc($iptc) {
# From the exiv2 sources
static $iptc_codes_to_names =
array(    
// IPTC.Envelope-->
"1#000" => 'ModelVersion',
"1#005" => 'Destination',
"1#020" => 'FileFormat',
"1#022" => 'FileVersion',
"1#030" => 'ServiceId',
"1#040" => 'EnvelopeNumber',
"1#050" => 'ProductId',
"1#060" => 'EnvelopePriority',
"1#070" => 'DateSent',
"1#080" => 'TimeSent',
"1#090" => 'CharacterSet',
"1#100" => 'UNO',
"1#120" => 'ARMId',
"1#122" => 'ARMVersion',
// <-- IPTC.Envelope
// IPTC.Application2 -->
"2#000" => 'RecordVersion',
"2#003" => 'ObjectType',
"2#004" => 'ObjectAttribute',
"2#005" => 'ObjectName',
"2#007" => 'EditStatus',
"2#008" => 'EditorialUpdate',
"2#010" => 'Urgency',
"2#012" => 'Subject',
"2#015" => 'Category',
"2#020" => 'SuppCategory',
"2#022" => 'FixtureId',
"2#025" => 'Keywords',
"2#026" => 'LocationCode',
"2#027" => 'LocationName',
"2#030" => 'ReleaseDate',
"2#035" => 'ReleaseTime',
"2#037" => 'ExpirationDate',
"2#038" => 'ExpirationTime',
"2#040" => 'SpecialInstructions',
"2#042" => 'ActionAdvised',
"2#045" => 'ReferenceService',
"2#047" => 'ReferenceDate',
"2#050" => 'ReferenceNumber',
"2#055" => 'DateCreated',
"2#060" => 'TimeCreated',
"2#062" => 'DigitizationDate',
"2#063" => 'DigitizationTime',
"2#065" => 'Program',
"2#070" => 'ProgramVersion',
"2#075" => 'ObjectCycle',
"2#080" => 'Byline',
"2#085" => 'BylineTitle',
"2#090" => 'City',
"2#092" => 'SubLocation',
"2#095" => 'ProvinceState',
"2#100" => 'CountryCode',
"2#101" => 'CountryName',
"2#103" => 'TransmissionReference',
"2#105" => 'Headline',
"2#110" => 'Credit',
"2#115" => 'Source',
"2#116" => 'Copyright',
"2#118" => 'Contact',
"2#120" => 'Caption',
"2#122" => 'Writer',
"2#125" => 'RasterizedCaption',
"2#130" => 'ImageType',
"2#131" => 'ImageOrientation',
"2#135" => 'Language',
"2#150" => 'AudioType',
"2#151" => 'AudioRate',
"2#152" => 'AudioResolution',
"2#153" => 'AudioDuration',
"2#154" => 'AudioOutcue',
"2#200" => 'PreviewFormat',
"2#201" => 'PreviewVersion',
"2#202" => 'Preview',
// <--IPTC.Application2
      );
   $human_readable = array();
   foreach ($iptc as $code => $field_value) {
       $human_readable[$iptc_codes_to_names[$code]] = $field_value;
   }
   return $human_readable;
}
fragmentedreality
  • 1,287
  • 9
  • 31
Inshallah
  • 4,804
  • 28
  • 24
  • You are overwriting the first entries in the `$iptc_codes_to_names` with the later ones with the same number. Why don't you use string as indexes for that array? – fragmentedreality May 04 '12 at 14:05
  • You are right. I copied it from the exiv2 c source code and probably made an error doing the conversion from c to php code. I'm using ints because... I probably thought a code is supposed to be numeric. It's pretty much the same if I use strings of digits as keys. (Leading zeros in the incoming codes like #0123 wouldn't work with string keys, but that's not the reason why I use ints, otherwise I would have commented it). – Inshallah May 04 '12 at 20:14
  • Still your code results in `$iptc_codes_to_names[0] == 'RecordVersion'` (as the value 'ModelVersion' gets overwritten). The IPTC.Envelope-fields that happen to have the same number as the IPTC.Application2-fields get lost. I already suggested an edit making this change, but I has not been accepted yet. For me the indexes work in this form: `$iptc_codes_to_names['1#000'] = 'ModelVersion'; $iptc_codes_to_names['2#000'] = 'RecordVersion'`. – fragmentedreality May 07 '12 at 07:07
  • Alright, I think I understand. With my substr() on the $code that strips out everything before the '#' I'm actually losing some information. – Inshallah May 07 '12 at 14:31
0

If you don't have extracted those IPTC data from your images, each time someone will search, you'll have to :

  • loop on every images
  • for each image, extract the IPTC data
  • see if the IPTC data for the current image matches

If you have more than a couple image, this will be really bad for performances, I'd say.


So, in my opinion, it would be far better to :

  • add a couple of fields in your database
  • extract the relevant IPTC data when the image is uploaded / stored
  • store the IPTC data in those DB fields
  • search in those DB fields
    • Or use some search engine like Lucene or Sphinx -- but that is another problem.

It'll mean a bit more work for you right now : you have more code to write...

... But it also means your website will have better chances to survive when there are several images and many users doing searches.

Pascal MARTIN
  • 395,085
  • 80
  • 655
  • 663