Questions tagged [nokogiri]

An HTML, XML, SAX and Reader parser for Ruby with the ability to search documents via XPath or CSS3 selectors… and much more

Nokogiri (鋸) is an HTML, XML, SAX and Reader parser for Ruby. Among Nokogiri’s many features is the ability to search documents via XPath or CSS3 selectors.

See the Nokogiri cheat-sheet for tips using Nokogiri.

A digest of most of the methods documented at nokogiri.org. Reading the source can help, too.

From the Nokogiri readme:

XML is like violence - if it doesn’t solve your problems, you are not using enough of it.

3699 questions
1
vote
3 answers

How can I strip HTML tags from a string in the model before I get to the view

Trying to determine how to strip the HTML tags from a string in Ruby. I need this to be done in the model before I get to the view. So using: ActionView::Helpers::SanitizeHelperstrip_tags() won't work. I was looking into using Nokogiri, but can't…
Zack Herbert
  • 942
  • 1
  • 16
  • 39
1
vote
3 answers

Is Nokogiri necessary for Rails?

I updated my Ruby from 1.9 to 2.2 and I found that the Nokogiri gem doesn't support Ruby 2.2 on Windows. Nokogiri was not in my Gemfile, but when I run bundle install it is automatically added. Maybe there are some dependency for it? This is a very…
asdfkjasdfjk
  • 3,784
  • 18
  • 64
  • 104
1
vote
2 answers

Selecting a specific table cell using CSS

I scraped the rankings table from atpworldtour.com and I'm trying to access the player names. An example of a row in the table looks like this: 1
SoSimple
  • 701
  • 9
  • 30
1
vote
2 answers

Get the link name of href tag nokogiri

I am scraping some data whos heirarchy is /h2/a but a's href should contain http://www.thedomain.com. All links are something like this: thedomain.com/test and so on. Right now I get the text only but not the name of the href link itself. For…
fscore
  • 2,567
  • 7
  • 40
  • 74
1
vote
0 answers

SAX Parser - Handle escape characters

I use a SAX parser to parse an XML file in which one of the nodes is: http://click.linksynergy.com/link?id=aLzfEdguEI4&offerid=dsf5.67217798179 My characters method currently returns the string…
1
vote
1 answer

Limit search scope of XPath in Nokogiri

I would like to find specific tags within a Node which is in a NodeSet but when I used XPath it returns results from the whole NodeSet. I'm trying to get something like: { "head1" => "Volume 1", "head2" => "Volume 2" } from this HTML:
Bart C
  • 1,509
  • 2
  • 16
  • 17
1
vote
0 answers

Nokogiri response different

Does anyone have a problem with Nokogiri acting differently between two servers, staging, and production? On staging, it grabs and returns the page properly using Nokogiri 1.4.2 and Mechanize 1.0.0. On production, it returns a much smaller set of…
Jerry Deng
  • 467
  • 4
  • 15
1
vote
0 answers

jQuery nextAll() traversing method equivalent in Nokogiri?

I wrote the following code with jQuery: $("#bar").nextAll(".foo").each(function(index){ console.log($(this)) }) And I'd like to transpose it to Nokogiri. I read the documentation for Nokogiri - jQuery but I cannot find how to write an…
Rowandish
  • 2,655
  • 3
  • 31
  • 52
1
vote
2 answers

Modifying an XML document with Nokogiri

I am using Nokogiri to modify the content of an XML file: ... I need add Default children to Types as…
Thang Le Sy
  • 107
  • 1
  • 4
  • 6
1
vote
2 answers

How do I scrape when there are multiple 'p' tags?

I'm trying to scrape a website that has multiple

tags which will always start with the words "Located in:...". None of the other

tags start with these words. How do I get my scraper to extract only those particular tags? This is scraper.rb: …

hikmatyar
  • 265
  • 3
  • 14
1
vote
1 answer

issue with a seemingly simple XML parse

I have an XML file:
46and2
  • 306
  • 2
  • 10
1
vote
1 answer

How to find element by attribute value?

I want to find the where id = 6. How do I do it? I tried the following, but it didn't work: 7 subject
yak_ilnur
  • 95
  • 1
  • 10
1
vote
1 answer

Parse TextMate snippet with Nokogiri

A TextMate snippet (.tmSnippet) usually looks something like this, whereas some key/string-pairs are optional and can be at any position.
idleberg
  • 12,634
  • 7
  • 43
  • 70
1
vote
1 answer

Get data attributes with Nokogiri

I'm scraping a site that has a number of divs with the same ".pane" class and same "data-pane" data attributes. input = doc.css('.pane[data-pane]') How do I filter or select from the above to get the div which has a "data-pane" attribute equal to a…
margo
  • 2,927
  • 1
  • 14
  • 31
1 2 3
99
100