I'm using hpricot to read HTML. I got a segmentation fault error, I googled and some say upgrade to latest version of Ruby. I am using rails 2.3.2 and ruby 1.8.7. How to resolve this error?
7 Answers
I was trying to parse html pages with many unicode characters in them and Hpricot kept crashing. Finally, I used the monkey patch from sanitize and put it in the environment.rb for my rails application. There hasn't been a single crash since I added this patch:
-
This worked perfect! I know I should switch to Nokogiri (and plan to), but I needed this fix for an older project! – Aug 10 '10 at 18:05
-
how to use this patch? – Alexander_F Jun 03 '14 at 22:18
If you're free to choose your HTML parsing library, switch it. Why, the creator of Hpricot, recently posted that you should better use Nokogiri instead of HPricot, nowadays.
You may also have a look at HTTParty.

- 1,371
- 1
- 14
- 21
-
1And he also subsequently vanished from the Internet, so for the moment HPricot appears to be unmaintained. – molf Aug 26 '09 at 17:56
From memory, since I last used it about a year ago:
Hpricot stores attributes in a fixed-size buffer, and some frameworks generate outrageously long hashes in document attributes. There's some static field you can set before parsing that lets you set the size of this buffer.
I remember it being fairly prominent in the docs on the webpage, though of course it's gone now.

- 5,337
- 3
- 28
- 19
This appears to be an outstanding issue on the bug list. I have experienced it to. My theory is has to do with the HTML structure or bad/corrupt character in the file but I have not found where exactly.
Here are the links to the issues:

- 331
- 2
- 4
I'm having the same segfault issue but sadly can't consult the issues Dave cited above, even via Google cache -- from what I've been googling the parse.rb segfaults have to do with encoded entities or alt character sets (accented characters perhaps)
The sanitize lib encountered the same issue and posted a monkeypatch here: http://github.com/rgrove/sanitize/blob/1e1dc9681de99e32dc166f591343dfa60fc1f648/lib/sanitize/monkeypatch/hpricot.rb

- 816
- 7
- 12
Well, based on your own question, I'd say "Upgrade to the latest version of Ruby". However, I've also had problems with hpricot segfaulting, which seemed to be related to my usage of threading.

- 48,938
- 12
- 131
- 152
-
But I am using almost the latest version of ruby already. Also, I am not doing any threading in my code :( – user85748 May 30 '09 at 22:18
-
-
My host is using 1.8.5 Even if I upgrade to 1.9.1 on my dev machine, I wont be able to deploy the code on production – user85748 May 30 '09 at 22:28
-
-
For clarification, upgrading to 1.9 is probably not the answer. Hpricot works better on 1.8 than 1.9. Still some bugs that haven't been worked out in 1.9. – Chuck Jun 20 '09 at 03:13
-
I'm experiencing the same bug with ruby 1.9.2 - so upgrading is not the answer. – digitalfrost Nov 23 '10 at 13:14