Questions tagged [hpricot]

Hpricot is a Ruby library intended for parsing HTML. Until the release of Nokogiri, a competing HTML and css parser, Hpricot was the defacto HTML parser for the ruby community.

Hpricot is a Ruby library intended for parsing HTML. Until the release of Nokogiri, a competing HTML and css parser, Hpricot was the defacto HTML parser for the ruby community.

163 questions
4
votes
3 answers

Ruby for romance? How to update a script from itself

My wife enjoys it when I use my geek abilities to be "romantic" so I had an idea for a ruby script to install on her Mac that would send her quotes and little notes from me throughout the day. I already figured out that I'll be using GeekTool to run…
James P. Wright
  • 8,991
  • 23
  • 79
  • 142
4
votes
3 answers

Hpricot, Get all text from document

I have just started learning Ruby. Very cool language, liking it a lot. I am using the very handy Hpricot HTML parser. What I am looking to do is grab all the text from the page, excluding the HTML tags. Example:
RailsSon
  • 19,897
  • 31
  • 82
  • 105
3
votes
0 answers

Is there anything like hpricot or beautiful soup for php?

Possible Duplicate: Robust, Mature HTML Parser for PHP I am looking for a good way to parse and modify html documents server side in php. Beautiful soup and hpricot look like very good tools but they are not available for php. Are there any good…
Craig
  • 7,471
  • 4
  • 29
  • 46
3
votes
1 answer

Hpricot / nokogiri - Parse SVG / XML file to get colors used

I need help in finding all colors used in an SVG (XML) file. For example, i need the list of colors used in the image http://upload.wikimedia.org/wikipedia/commons/e/e9/Pepsi_logo_2008.svg I was trying with hpricot / nokogiri gems to do something…
max
  • 347
  • 3
  • 14
3
votes
1 answer

Disable error correction in Nokogiri

I'm working with a number of malformed HTML pages. At least, I presume they're malformed because when I parse them in Nokogiri and then execute to_html, elements don't appear correctly anymore. When I parse them with Hpricot, however, they display…
JellicleCat
  • 28,480
  • 24
  • 109
  • 162
3
votes
1 answer

Programmatically remove images and videos from html

I'm working on Ruby on Rails 2.3.8 and I've got a website in which users type posts. Each of them has a short description that is shown in the main page. That description is automatically built from the original, but it's just truncated so it…
Brian Roisentul
  • 4,590
  • 8
  • 53
  • 81
3
votes
2 answers

Issue with unclosed img tag

data presented in HTML format and submitted to server, that does some preprocessing. It operates with "src" attribute of "img" tag. After preprocessing and saving, all the preprocessed "img" tags are not self-closed. For example, if "img" tag was…
AntonAL
  • 16,692
  • 21
  • 80
  • 114
3
votes
2 answers

Encoding problems with hpricot

I am getting the following encoding error when trying to scrape web pages with hpricot in ruby 1.9: Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8 I can reproduce the error by doing the following: ska:~ sam$…
Sam
  • 6,240
  • 4
  • 42
  • 53
3
votes
3 answers

escape colon in Xpath search

I'm using Hpricot with selenium I have this html input element: And I'm trying to get this value with this Xpath expression: source = Hpricot(@selenium.get_html_source) source.search("//input[@id='foo:bar']") but it is not…
Oscar C
  • 51
  • 1
  • 7
3
votes
2 answers

How can I get Hpricot to play nice with HTML5?

I am using Hpricot to parse a theme file. I have noticed, however, that if I feed a valid HTML5 document into Hpricot(), it auto-closes HTML5 tags (like
), and messes with the DOCTYPE. Are there any extensions to Hpricot, or perhaps a flag…
Adam Singer
  • 2,377
  • 3
  • 18
  • 18
3
votes
2 answers

Extracting values from a login accessible web page post-javascript using Ruby

I have a stock trading website that is only accessible after logging into the site. After logging in, there is a stock value that I am trying to extract. That number is not readily available and takes a while to load as it is being updated from the…
walterfaye
  • 819
  • 2
  • 9
  • 15
3
votes
1 answer

Parse XML with JRuby (Hpricot?) with tags like

I'm trying to consume some legacy XML with elements like this in JRuby: content I've been working with Hpricot, but Hpricot's HTML-oriented shortcuts are working against…
Ed Brannin
  • 7,691
  • 2
  • 28
  • 32
3
votes
1 answer

Hpricot remove single element

I'm using Ruby's Hpricot gem to parse html. I'd like to remove a single node from the document for use elsewhere, but I can't find a way. I see that I can remove an entire list of elements, using an instance of Hpricot::Elements (x =…
JellicleCat
  • 28,480
  • 24
  • 109
  • 162
3
votes
4 answers

iconv will be deprecated in the future, use String#encode instead

Am getting the following deprecated warnings with ruby 1.9.3-p125 when i run rspec. But there are no deprecated warnings with ruby 1.9.2. /gems/ruby-1.9.3-p125@cs/gems/soap4r-1.5.8/lib/xsd/iconvchars et.rb:9:in `': iconv will be…
diya
  • 6,938
  • 9
  • 39
  • 55
2
votes
7 answers

Segmentation fault in hpricot

I'm using hpricot to read HTML. I got a segmentation fault error, I googled and some say upgrade to latest version of Ruby. I am using rails 2.3.2 and ruby 1.8.7. How to resolve this error?
user85748
  • 1,213
  • 3
  • 14
  • 21
1
2
3
10 11