-2

I am trying to fetch the most relevant image from a url. I want to fetch the image that is closest to the title 'text' of the page. Or put it in a different way. I want to give scores to images based on their distance from the the title 'text'. And then fetch the image with the highest score.

The title 'text' could be in a heading element

<h1>title text</h1>,<h2>title text<h2>,etc

Or It may match up with the alt attribute of the

<img alt='title text'> tags.

Or It may also be in any other element like

<p> , <span> , <div> etc

for eg:

Lets say the title of the page is as follows:

<title>White Gold Round Diamond Wedding Band: Jewelry: Amazon.com</title>

And in the body of the page we have something like:

<h1>White Gold Round Diamond Wedding Band</h1>

The element closest to the above tag lets say is inside a div as follows:

<div class='abc'>
    <img src='efg' />
</div>

Then the above image should get the highest score.

Instead , if the img's alt attribute matches the title , then that image should get the highest score.

Thanks in advance.

Ankit Khatri
  • 253
  • 1
  • 4
  • 11
  • -1 What have you tried? Or are you just posting requirements? – hakre Oct 17 '12 at 10:54
  • What is the question? How to measure string distance? (Google 'edit distance' and 'Levenshtein distance'.) How to calculate edit distance in PHP? How to measure the distance of images from a given heading? – C. M. Sperberg-McQueen Oct 17 '12 at 16:07
  • @C.M.Sperberg-McQueen The closest is 'How to measure the distance of images from a given heading in php?' – Ankit Khatri Oct 17 '12 at 18:17

1 Answers1

3

I don't think this is a good solution. Instead of this you can try getting og:image if it is set.

Another solution is to get all the images with XPath and get only those with a specified size, e.g.: bigger than 150px X 150px and limited width/height ratio for example from 0.5 to 2. If there are more than 1 image you can let the user choose one of them with a simple image slider just like on Facebook's share popups.

Also you can use something like Embed.LY API, it works very accurately if you want to get some product images.

If you are working with Amazon and/or Ebay offers thoroughly, then you can try Amazon's Product Advertising API and Ebay's Finding API for best results. You just have to extract the offer ID from the given URL and send an API request to get the details for that offer, including the images with varied size.

And finally, best solution could be combining each method and use them like all-in-one.

enenen
  • 1,967
  • 2
  • 17
  • 33
  • This is a fallback solution to the first two cases you mentioned. I have already written code to fetch the open graph og:image and to fetch the largest image. But even if that doest work, i felt this could be my last retort – Ankit Khatri Oct 17 '12 at 11:10
  • Embed.ly is good but it has a limit of 10000 urls for the free plan :P – Ankit Khatri Oct 17 '12 at 11:13
  • Its not just ebay or Amazon, it could be any product webstore and hence, any product url of such a store. – Ankit Khatri Oct 17 '12 at 11:54