Integrate web image from rss feed to Ruby

Question

It may be unclear but I'll do my best. I'm currently using dashing, the dashboard designer (sinatra based) with the RSS widget. The thing is that I am unable to get the little image before each RSS item:

<description>
&lt;img style='vertical-align:middle' src='http://pitre-web.tpg.ch/images?ligne=D' title='Perturbation Line D' alt='Perturbation Line D' /&gt;
&lt;br/&gt;&lt;br/&gt;21:03 - THEME - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
</description>

I know the code looks a bit strange but on the webpage all the stuff until 21:03 is ignored. How can I integrate the small logo to the page or at least get the line number (it's a bus line -> here it's D) in order to integrate in plain text in my widget? I don't know if that helps, but I am using nogokiri to fetch the XML from the RSS feed. So what could i put there to fetch this piece of information?

summary = clean_html( news_item.xpath('description').text )

Thanks in advance :)

Do you want to find the `` tag, or get at its contents? – the Tin Man Nov 14 '14 at 22:15 — the Tin Man, Nov 14 '14 at 22:15

the Tin Man · Accepted Answer · 2014-11-21T18:01:36.423

The content of the <description> tag is HTML-encoded, so it needs to be decoded back to HTML, then reparsed:

require 'nokogiri'

doc = Nokogiri::XML::DocumentFragment.parse(<<EOT)
<description>
&lt;img style='vertical-align:middle' src='http://pitre-web.tpg.ch/images?ligne=D' title='Perturbation Line D' alt='Perturbation Line D' /&gt;
&lt;br/&gt;&lt;br/&gt;21:03 - THEME - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
</description>
EOT

This is how to locate the tag:

description_text = doc.at('description')

To access its content use:

description_text = doc.at('description').text 
# => "\n<img style='vertical-align:middle' src='http://pitre-web.tpg.ch/images?ligne=D' title='Perturbation Line D' alt='Perturbation Line D' />\n<br/><br/>21:03 - THEME - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.\n"

To do something with that content:

description_doc = Nokogiri::HTML::DocumentFragment.parse(description_text)
description_doc.at('img')['src'] # => "http://pitre-web.tpg.ch/images?ligne=D"

The real XML doesn't match what was given in the question. Here's a better example showing what is being encountered:

<?xml version='1.0' encoding='UTF-8'?>
<rss>
  <channel>
    <title />
    <description />
    <item>
      <description>
&lt;img style='vertical-align:middle' src='http://pitre-web.tpg.ch/images?ligne=2' title='Perturbation Ligne 2' alt='Perturbation Ligne 2' /&gt;
      &lt;br/&gt;&lt;br/&gt;18:47 - Surcharge de trafic - Retard de 8 minutes entre Marbriers et Gen&amp;egrave;ve-Plage.
      </description>
    </item>
    <item>
      <description>
&lt;img style='vertical-align:middle' src='http://pitre-web.tpg.ch/images?ligne=19' title='Perturbation Ligne 19' alt='Perturbation Ligne 19' /&gt;
      &lt;br/&gt;&lt;br/&gt;18:43 - Cimeti&amp;egrave;re Saint-Georges - direction Vernier-Village - Incident &amp;agrave; bord du v&amp;eacute;hicule - Immobilisation du v&amp;eacute;hicule
      </description>
    </item>
    </channel>
</rss>

Based on that, here's code that works to extract the URLs:

require 'nokogiri'
doc = Nokogiri::XML(open('xml'))
img_srces = doc.search('item description').map{ |description|
  desc_doc = Nokogiri::HTML(description.text)
  desc_doc.at('img')['src']
}
img_srces
# => ["http://pitre-web.tpg.ch/images?ligne=2",
#     "http://pitre-web.tpg.ch/images?ligne=19"]

Thanks for the answer, but my compiler doesn't accept the ['src']... I want to do a news_headlines.push({}) but only with the alt of the image so i can get that in a text form on my website. undefined method `[]' for nil:NilClass — ddgav, Nov 15 '14 at 16:08
What compiler? If you get a nil, then your XML example doesn't match your working XML, since the code example I gave worked to get the value from the `src` parameter in the example XML. — the Tin Man, Nov 15 '14 at 18:53
If you want, you can try by yourself, the feed is this one: http://www.tpg.ch/perturbation/xml Thanks — ddgav, Nov 21 '14 at 17:36
The XML sample you gave us doesn't match the real one; There is an empty `` tag in the document before the ones you want. That's why it's REALLY important to give us accurate input data. — the Tin Man, Nov 21 '14 at 17:57

Integrate web image from rss feed to Ruby

1 Answers1