Questions tagged [hpricot]

Hpricot is a Ruby library intended for parsing HTML. Until the release of Nokogiri, a competing HTML and css parser, Hpricot was the defacto HTML parser for the ruby community.

Hpricot is a Ruby library intended for parsing HTML. Until the release of Nokogiri, a competing HTML and css parser, Hpricot was the defacto HTML parser for the ruby community.

163 questions
2
votes
3 answers

Hpricot CSS Class search

I am working on some code that scrapes a page for two css classes on a page. I am simply using the Hpricot search method for this as so: webpage.search("body").search("div.first_class | div.second_class") ...for each item found i create an object…
Pete
  • 1,472
  • 2
  • 15
  • 32
1
vote
2 answers

Hpricot: How to extract inner text without other html subelements

I'm working on a vim rspec plugin (https://github.com/skwp/vim-rspec) - and I am parsing some html from rspec. It looks like this: doc = %{
This is the heading text
Some puts output here
} I can get the…
Yan Pritzker
  • 151
  • 2
  • 5
1
vote
3 answers

Tbody tag in xpath produced by fire bug

I'm trying to extract some data from online htmls using ruby hpricot library. I use the firefox extension fire bug to get the xpath of a selected item. There's always the extra tbody tag present in the produced xpath expression. In some cases, I…
Terry Li
  • 16,870
  • 30
  • 89
  • 134
1
vote
1 answer

How do I scrape a site, with multiple pages, and create one single html page with Ruby?

So what I would like to do is scrape this site: http://boxerbiography.blogspot.com/ and create one HTML page that I can either print or send to my Kindle. I am thinking of using Hpricot, but am not too sure how to proceed. How do I set it up so it…
marcamillion
  • 32,933
  • 55
  • 189
  • 380
1
vote
4 answers

Installing hpricot for JRuby

I'm trying to look at cucumber for Jruby on Rails. One of the pre-requesites is webrat which has as pre-requisite hpricot. I've installed the gem with hpricot using: gem install hpricot --source http://code.whytheluckystiff.net --version 0.6.1…
Matthew Farwell
  • 60,889
  • 18
  • 128
  • 171
1
vote
0 answers

Hpricot and Rails Rendering

I finally managed to get Hpricot and Rails working together as below: parser_controller: def deck require 'hpricot' require 'open-uri' @doc = open("http://www.keo.co.za/") { |f| Hpricot(f) } end end deck…
Erin Walker
  • 739
  • 1
  • 11
  • 30
1
vote
2 answers

hpricot-invalid byte sequence in UTF-8

I already done some searches but none of that can solve this peculiar,unexpected problem. Just look at the code blow: require 'open-uri' require 'hpricot' doc = Hpricot(open("http://www.baidu.com/")) #this web page's encoding is GB2312 I don't know…
castiel
  • 2,675
  • 5
  • 29
  • 38
1
vote
1 answer
1
vote
1 answer

Strange symbols in web-page's source

i've got a problem i try to parce a web page which in UTF-8 and have russian text by using Hpricot The problem is that i get russian text with some strange symbols and i get an error when i try to convert (iconv) from UTF-8 to windows-1251 or ASCII…
Andrey Eremin
  • 287
  • 1
  • 4
  • 14
1
vote
2 answers

How to remove a particular content inside a div using Hpricot

I have the following html structure

asdasdasdas

asdasdasdas

asdasdasdas

asdasdasdas

Content to be excluded
What I need is, when I search for…
Amal Kumar S
  • 15,555
  • 19
  • 56
  • 88
1
vote
2 answers

Hpricot: how to do conditional search using Hpricot in Ruby on Rails

I am parsing two different sites having similar HTML tags. I need to use a common parser for this. My issue is one site has a HTML format div/ol/li/span/a and other has div/ol/li/h3/a My current parser code is doc =…
Amal Kumar S
  • 15,555
  • 19
  • 56
  • 88
1
vote
1 answer

Hpricot-style "container" method for Nokogiri? Select only certain node_types

I'm navigating a document using CSS selectors with Ruby, but I've found some css-selector bugs in Hpricot that are fixed in Nokogiri, and want to switch over. The one issue I'm having is figuring out how to get an array of all children that are…
Sina Iman
  • 48
  • 7
1
vote
2 answers

Hpricot XML text search

Hpricot + Ruby XML parsing and logical selection. Objective: Find all title written by author Bob. My XML file: Book1 march 1…
Ura
  • 2,173
  • 3
  • 24
  • 41
1
vote
2 answers

Nokogiri equivalent of Hpricot's html method

Hpricot's html method spits out just the HTML in the document: > Hpricot('

a

').html => "

a

" By contrast, the closest I can come with Nokogiri is the inner_html method, which wraps its output in and tags: >…
Tom Lehman
  • 85,973
  • 71
  • 200
  • 272
1
vote
2 answers

Problem with loading the hpricot gem

I have a problem with loading the hpricot gem. I'm using it in a rake task and put a require "hpricot" in it. But it doesn't load with a error message:no such file to load -- hpricot But I'll see it in my gem list but don't know why the rake task…
LeonS
  • 2,684
  • 2
  • 31
  • 36