2

I'm working on a Java project that reads a RSS from a URL source, parse the date, title and description and convert it into a JSON file to be shown on a TV screen as a HTML5 page.

So the steps are: the admin post the source of the RSS feed, like:

http://g1.globo.com/dynamo/pr/parana/rss2.xml http://noticias.r7.com/economia/feed.xml http://feeds.feedburner.com/Rss-Presidencia-Agenda?fmt=xml http://rss.cnn.com/rss/edition_world.rss

...or any other.

With Java Rome framework, I convert the content of the URL into a JSON file and send it to a HTML5/Javascript page (via socket, not the point), that shows it on a stylized page.

I can get the date, title and description because they are fields that contain default tags, but I want to get the image of the news as well.

The problem is: sometimes, according to the feed source, the image is in the "image/url" tag (like the first link example), sometimes it is in the "content" or "description" tag as a HTML code, and sometimes is in any other place. Sometimes, the image is just a thumbnail, or just a bullet.

So I'm not able to show image of the news, as required. Is there any way to do that, in a standardized way, regardless of the source of the news?

aseolin
  • 1,184
  • 3
  • 17
  • 35
  • For that goal, I have used jsoup library to parse html fragment. I try the most frequent case first, then the second and so on. As a default case, I search the tag – Aubin Feb 23 '17 at 11:45

0 Answers0