I have developed an image searching tool to index a directory of all images, store the data and then retrieve the results based on a search.
This is working but due to the masses of data (approx 2.7 million rows of metadata for 150,000 images), it is taking a long time to return results (sometime >5 mins). I believe this is due to querying the second set of data in a foreach loop. I am looking to combine my sql into a single statement to better optimise (unless there is a another way!).
So far I am returning all the image_id's whereby the metadata matches the search term, and then using the returned image_id's to get all the metadata in a metadata table for each image. Here is the basic structure of the db:
image_id filename thumb
============= ============ =================
1 image_XYZ.jpg image_XYZ_thumb.jpg
2 emoticon.png emoticon_thumb.jpg
3 runner_bean.jpg runner_bean_thumb.jpg
meta_id meta_key meta_value image_id
======= ======== ========== ========
1 filetype jpg 1
2 keyword runner 1
3 height 600 1
4 filetype png 2
5 filesize 5198413 2
6 description smiley face 2
7 filetype jpg 3
8 filesize 12485 3
9 description runnerbean 3
10 keyword runner 3
11 keyword vegetable 3
1.
So far my code starts by getting all the images where the search term "runner" matches (So images 1 and 3)
SELECT image_id
FROM metadata
WHERE (meta_value LIKE '%runner%')
AND meta_key IN ('keyword', 'filename', 'description')
GROUP BY image_id
2.
Then looping through the returned result set and get the metadata for each:
$search_results = $imagesearch->search_images(); //object array from above sql
foreach ($search_results as $image) {
$id = $image->image_id;
$get_metadata = $imageget->get_metadata($id)
}
3.
SELECT *
FROM metadata
WHERE image_id = $id
With the metadata I am then extracting the rows where the meta_key equals the data I want (e.g. height ) to retrieve the value etc.
So my final array would look something like this:
Array
(
[1] => array(
[image_id] => 1
[filename] => image_XYZ.jpg
[thumb] => image_XYZ_thumb.jpg
[filetype] => jpg
[keyword] => runner
[height] => 600
),
[2] => array(
[image_id] => 3
[filename] => runner_bean.jpg
[thumb] => runner_bean_thumb.jpg
[filetype] => jpg
[filesize] => 12485
[description] => runnerbean
[keyword] => runner
[keyword] => vegetable
),
)
My effort so far was to use a subquery to first get the images that matchup the search term, and then get the metadata for those images:
SELECT *
FROM metadata m
LEFT JOIN image i ON i.image_id = m.image_id
WHERE i.image_id IN (
SELECT image_id
FROM metadata
WHERE $search
AND meta_key IN ('keyword', 'filename', 'description')
GROUP BY image_id
)
GROUP BY m.image_id
My group by is collapsing the results by the image id so I only get the first row for each image (filetype in above example). Is there a way to get a final array of images that match the search term and return the metadata for each in an more optimised way?
Any help would be appreciated