0

It may be unclear but I'll do my best. I'm currently using dashing, the dashboard designer (sinatra based) with the RSS widget. The thing is that I am unable to get the little image before each RSS item:

<description>
&lt;img style='vertical-align:middle' src='http://pitre-web.tpg.ch/images?ligne=D' title='Perturbation Line D' alt='Perturbation Line D' /&gt;
&lt;br/&gt;&lt;br/&gt;21:03 - THEME - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
</description>

I know the code looks a bit strange but on the webpage all the stuff until 21:03 is ignored. How can I integrate the small logo to the page or at least get the line number (it's a bus line -> here it's D) in order to integrate in plain text in my widget? I don't know if that helps, but I am using nogokiri to fetch the XML from the RSS feed. So what could i put there to fetch this piece of information?

summary = clean_html( news_item.xpath('description').text )

Thanks in advance :)

ddgav
  • 3
  • 3

1 Answers1

1

The content of the <description> tag is HTML-encoded, so it needs to be decoded back to HTML, then reparsed:

require 'nokogiri'

doc = Nokogiri::XML::DocumentFragment.parse(<<EOT)
<description>
&lt;img style='vertical-align:middle' src='http://pitre-web.tpg.ch/images?ligne=D' title='Perturbation Line D' alt='Perturbation Line D' /&gt;
&lt;br/&gt;&lt;br/&gt;21:03 - THEME - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
</description>
EOT

This is how to locate the tag:

description_text = doc.at('description')

To access its content use:

description_text = doc.at('description').text 
# => "\n<img style='vertical-align:middle' src='http://pitre-web.tpg.ch/images?ligne=D' title='Perturbation Line D' alt='Perturbation Line D' />\n<br/><br/>21:03 - THEME - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.\n"

To do something with that content:

description_doc = Nokogiri::HTML::DocumentFragment.parse(description_text)
description_doc.at('img')['src'] # => "http://pitre-web.tpg.ch/images?ligne=D"

The real XML doesn't match what was given in the question. Here's a better example showing what is being encountered:

<?xml version='1.0' encoding='UTF-8'?>
<rss>
  <channel>
    <title />
    <description />
    <item>
      <description>
&lt;img style='vertical-align:middle' src='http://pitre-web.tpg.ch/images?ligne=2' title='Perturbation Ligne 2' alt='Perturbation Ligne 2' /&gt;
      &lt;br/&gt;&lt;br/&gt;18:47 - Surcharge de trafic - Retard de 8 minutes entre Marbriers et Gen&amp;egrave;ve-Plage.
      </description>
    </item>
    <item>
      <description>
&lt;img style='vertical-align:middle' src='http://pitre-web.tpg.ch/images?ligne=19' title='Perturbation Ligne 19' alt='Perturbation Ligne 19' /&gt;
      &lt;br/&gt;&lt;br/&gt;18:43 - Cimeti&amp;egrave;re Saint-Georges - direction Vernier-Village - Incident &amp;agrave; bord du v&amp;eacute;hicule - Immobilisation du v&amp;eacute;hicule
      </description>
    </item>
    </channel>
</rss>

Based on that, here's code that works to extract the URLs:

require 'nokogiri'
doc = Nokogiri::XML(open('xml'))
img_srces = doc.search('item description').map{ |description|
  desc_doc = Nokogiri::HTML(description.text)
  desc_doc.at('img')['src']
}
img_srces
# => ["http://pitre-web.tpg.ch/images?ligne=2",
#     "http://pitre-web.tpg.ch/images?ligne=19"]
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • Thanks for the answer, but my compiler doesn't accept the ['src']... I want to do a news_headlines.push({}) but only with the alt of the image so i can get that in a text form on my website. undefined method `[]' for nil:NilClass – ddgav Nov 15 '14 at 16:08
  • What compiler? If you get a nil, then your XML example doesn't match your working XML, since the code example I gave worked to get the value from the `src` parameter in the example XML. – the Tin Man Nov 15 '14 at 18:53
  • If you want, you can try by yourself, the feed is this one: http://www.tpg.ch/perturbation/xml Thanks – ddgav Nov 21 '14 at 17:36
  • The XML sample you gave us doesn't match the real one; There is an empty `` tag in the document before the ones you want. That's why it's REALLY important to give us accurate input data. – the Tin Man Nov 21 '14 at 17:57