0

I use Simple HTML DOM library in my Drupal custom module to do a task in my project.

The task simply is imitating the Facebook action, when we paste an article url, FB scrap the url and return back with part of the article as a description and an image.

My question is, what is the used algorithm to pick the first part of the article between a lot of <p> tags and also pick the right picture between all the pics in the page!

I know that FB use a :OG tag, but I need to develop an algorithm which pick these info if the OG tag is not there.

Thank you guys for support and have a nice day.

Regards.

Ahmed
  • 37
  • 10

1 Answers1

0

I think with the image it is the dimensions of the image. Th takes the first image with more than say 100x100 pixels or so.

With the text it might be something similar. Strip the inline HTML tags, get the first block element text (or maybe just paragraphs) and there you go.

yunzen
  • 32,854
  • 11
  • 73
  • 106