Questions tagged [hpricot]

Hpricot is a Ruby library intended for parsing HTML. Until the release of Nokogiri, a competing HTML and css parser, Hpricot was the defacto HTML parser for the ruby community.

163 questions

votes

3 answers

How to get all image, pdf and other files links from a web page?

I have to develop a Ruby on Rails application which fetches all the images, pdf, cgi, etc. file extension links from web page.

asked Jan 04 '12 at 06:06

Aniruddhsinh

2,099
3
15
19

votes

5 answers

hpricot with firebug's XPath

I'm trying to extract some info from a table based website with hpricot. I get the XPath with FireBug. /html/body/div/table/tbody/tr/td/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr[3]/td/table[3]/tbody/tr This doesn't work...…

ruby xpath firebug hpricot

asked Apr 09 '09 at 13:18

Ruby n00b

votes

1 answer

Hpricot search all the tags under one specific namespace

For example I have the following code: <io:content part="title" />

parsing xhtml jruby xml-namespaces hpricot

asked Aug 27 '11 at 13:48

arkxu

votes

1 answer

Ruby: clean up HTML, use Hpricot or just regex?

I'm looking to do some rudimentary cleansing of HTML. Basically want to create a whitelist of tags that are allowed and reject anything else. Is Hpricot worth it in this case? Does it have a feature that I've overlooked that will save me from…

html ruby hpricot

asked Apr 04 '11 at 21:46

randombits

47,058
76
251
433

votes

3 answers

What is the divisor notation used (for example) in Hpricot?

In the Hpricot docs (at https://github.com/hpricot/hpricot) there is a doc.search() method. The docs then go on to say "A shortcut is to use the divisor": (doc/"p.posted") It works, that's for sure, but I'm wondering, what notation is this? I have…

ruby hpricot

asked Jan 24 '11 at 13:17

kmc

votes

3 answers

Ruby Hpricot RegEx replace
's with
's

Can someone please tell me how to convert this line of Javascript to Ruby using Hpricot & RegEx? // Replace all doubled-up
tags with

tags, and remove fonts. var pattern = new RegExp ("
[ \r\n\s]*
", "g"); …

javascript ruby regex hpricot

asked Aug 09 '10 at 00:49

dpigera

3,339
5
39
60

votes

2 answers

Can I use Hpricot to find the main article text of any/most websites?

I need a way of extracting the main text from any webpage that displays an article. Similar to the way that Readability can find the main text on any website that it's run on. I'm using Ruby on Rails, so I think Hpricot is my best bet. Is what I'm…

ruby screen-scraping hpricot

asked Jul 18 '10 at 11:23

ben

29,229
42
124
179

votes

1 answer

how to remove html element's style attribute using Hpricot?

like this:

Hello world just do it

I want to remove every element's "style" attribute. I want the result like this:

Hello world just do it

how to do…

html ruby hpricot

asked Jun 18 '10 at 02:31

www

4,065
7
30
27

votes

2 answers

How to get an element using inner text (Watir, Nokogir, Hpricot)

I have been expeirmenting with Watir, Nokogir and Hpricot. All of these use top->down approach which is my problem. i.e. they use element type to search element. I want to find out the element using the text without knowing element…

watir hpricot

asked Feb 13 '10 at 21:03

Hpriguy

votes

1 answer

Hpricot version not working

I'm trying to migrate my blog to Jekyll, following these instructions: http://jekyllrb.com/docs/migrations/ I've got all my posts in .xml format, but the command to convert them does not seem to be working: ruby -rubygems -e 'require…

ruby jekyll hpricot

asked Sep 22 '13 at 19:17

RobinLovelace

4,799
6
29
40

votes

2 answers

Why does Twitter API return a 400 error in production?

I have a Twitter app that works fantastic locally - it searches for keywords then for each user it grabs their info using Hpricot to parse the xml e.g. Hpricot(open("http://twitter.com/users/show/"+myuser+".xml")) Works fine locally but when I go…

ruby-on-rails twitter hpricot

asked Oct 14 '09 at 23:50

Fonziguy

votes

3 answers

Searching all elements before an h2 element in hpricot/nokogiri

I am attempting to parse a Wiktionary entry to retrieve all english definitions. I am able to retrive all definitions, the problem is that some definitions are in other languages. What I would like to do is somehow retrieve only the HTML block…

ruby parsing nokogiri hpricot wiktionary

asked Sep 21 '09 at 00:46

Dave

votes

4 answers

ROR/Hpricot: parsing a site and searching/comparing strings with regex

I just started with Ruby On Rails, and want to create a simple web site crawler which: Goes through all the Sherdog fighters' profiles. Gets the Referees' names. Compares names with the old ones (both during the site parsing and from the…

ruby-on-rails ruby ruby-on-rails-3 hpricot

asked Oct 11 '12 at 02:21

Mikko Vedru

votes

2 answers

How do you know when to use an XML parser and when to use ActiveResource?

I tried using ActiveResource to parse a web service that was more like a HTML document and I kept getting a 404 error. Do I need to use an XML parser for this task instead of ActiveResource? My guess is that ActiveResource is only useful if you are…

ruby-on-rails web-services nokogiri hpricot activesupport

asked Aug 10 '09 at 13:05

chimp

votes

5 answers

Removing anything between XML tags and their content

I would need to remove anything between XML tags, especially whitespace and newlines. For example removing whitespace and newslines from: \n to get: This is not meant for parsing XML by…

xml ruby regex hpricot

asked Jul 20 '09 at 19:02

rubiii

1,926
2
12
8

Prev 1 2

…

10 11 Next