Questions tagged [hpricot]

Hpricot is a Ruby library intended for parsing HTML. Until the release of Nokogiri, a competing HTML and css parser, Hpricot was the defacto HTML parser for the ruby community.

Hpricot is a Ruby library intended for parsing HTML. Until the release of Nokogiri, a competing HTML and css parser, Hpricot was the defacto HTML parser for the ruby community.

163 questions
2
votes
3 answers

How to get all image, pdf and other files links from a web page?

I have to develop a Ruby on Rails application which fetches all the images, pdf, cgi, etc. file extension links from web page.
Aniruddhsinh
  • 2,099
  • 3
  • 15
  • 19
2
votes
5 answers

hpricot with firebug's XPath

I'm trying to extract some info from a table based website with hpricot. I get the XPath with FireBug. /html/body/div/table/tbody/tr/td/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr[3]/td/table[3]/tbody/tr This doesn't work...…
Ruby n00b
2
votes
1 answer

Hpricot search all the tags under one specific namespace

For example I have the following code: <io:content part="title" />
arkxu
  • 31
  • 2
2
votes
1 answer

Ruby: clean up HTML, use Hpricot or just regex?

I'm looking to do some rudimentary cleansing of HTML. Basically want to create a whitelist of tags that are allowed and reject anything else. Is Hpricot worth it in this case? Does it have a feature that I've overlooked that will save me from…
randombits
  • 47,058
  • 76
  • 251
  • 433
2
votes
3 answers

What is the divisor notation used (for example) in Hpricot?

In the Hpricot docs (at https://github.com/hpricot/hpricot) there is a doc.search() method. The docs then go on to say "A shortcut is to use the divisor": (doc/"p.posted") It works, that's for sure, but I'm wondering, what notation is this? I have…
kmc
  • 660
  • 13
  • 25
2
votes
3 answers

Ruby Hpricot RegEx replace
's with

's

Can someone please tell me how to convert this line of Javascript to Ruby using Hpricot & RegEx? // Replace all doubled-up
tags with

tags, and remove fonts. var pattern = new RegExp ("
[ \r\n\s]*
", "g"); …

dpigera
  • 3,339
  • 5
  • 39
  • 60
2
votes
2 answers

Can I use Hpricot to find the main article text of any/most websites?

I need a way of extracting the main text from any webpage that displays an article. Similar to the way that Readability can find the main text on any website that it's run on. I'm using Ruby on Rails, so I think Hpricot is my best bet. Is what I'm…
ben
  • 29,229
  • 42
  • 124
  • 179
2
votes
1 answer

how to remove html element's style attribute using Hpricot?

like this:

Hello world just do it

I want to remove every element's "style" attribute. I want the result like this:

Hello world just do it

how to do…
www
  • 4,065
  • 7
  • 30
  • 27
2
votes
2 answers

How to get an element using inner text (Watir, Nokogir, Hpricot)

I have been expeirmenting with Watir, Nokogir and Hpricot. All of these use top->down approach which is my problem. i.e. they use element type to search element. I want to find out the element using the text without knowing element…
Hpriguy
  • 21
  • 2
2
votes
1 answer

Hpricot version not working

I'm trying to migrate my blog to Jekyll, following these instructions: http://jekyllrb.com/docs/migrations/ I've got all my posts in .xml format, but the command to convert them does not seem to be working: ruby -rubygems -e 'require…
RobinLovelace
  • 4,799
  • 6
  • 29
  • 40
2
votes
2 answers

Why does Twitter API return a 400 error in production?

I have a Twitter app that works fantastic locally - it searches for keywords then for each user it grabs their info using Hpricot to parse the xml e.g. Hpricot(open("http://twitter.com/users/show/"+myuser+".xml")) Works fine locally but when I go…
Fonziguy
  • 21
  • 2
2
votes
3 answers

Searching all elements before an h2 element in hpricot/nokogiri

I am attempting to parse a Wiktionary entry to retrieve all english definitions. I am able to retrive all definitions, the problem is that some definitions are in other languages. What I would like to do is somehow retrieve only the HTML block…
Dave
2
votes
4 answers

ROR/Hpricot: parsing a site and searching/comparing strings with regex

I just started with Ruby On Rails, and want to create a simple web site crawler which: Goes through all the Sherdog fighters' profiles. Gets the Referees' names. Compares names with the old ones (both during the site parsing and from the…
Mikko Vedru
  • 333
  • 2
  • 11
2
votes
2 answers

How do you know when to use an XML parser and when to use ActiveResource?

I tried using ActiveResource to parse a web service that was more like a HTML document and I kept getting a 404 error. Do I need to use an XML parser for this task instead of ActiveResource? My guess is that ActiveResource is only useful if you are…
2
votes
5 answers

Removing anything between XML tags and their content

I would need to remove anything between XML tags, especially whitespace and newlines. For example removing whitespace and newslines from: \n to get: This is not meant for parsing XML by…
rubiii
  • 1,926
  • 2
  • 12
  • 8
1 2
3
10 11