I'm working on a Java project that reads a RSS from a URL source, parse the date, title and description and convert it into a JSON file to be shown on a TV screen as a HTML5 page.
So the steps are: the admin post the source of the RSS feed, like:
http://g1.globo.com/dynamo/pr/parana/rss2.xml http://noticias.r7.com/economia/feed.xml http://feeds.feedburner.com/Rss-Presidencia-Agenda?fmt=xml http://rss.cnn.com/rss/edition_world.rss
...or any other.
With Java Rome framework, I convert the content of the URL into a JSON file and send it to a HTML5/Javascript page (via socket, not the point), that shows it on a stylized page.
I can get the date, title and description because they are fields that contain default tags, but I want to get the image of the news as well.
The problem is: sometimes, according to the feed source, the image is in the "image/url" tag (like the first link example), sometimes it is in the "content" or "description" tag as a HTML code, and sometimes is in any other place. Sometimes, the image is just a thumbnail, or just a bullet.
So I'm not able to show image of the news, as required. Is there any way to do that, in a standardized way, regardless of the source of the news?