Questions tagged [nokogiri]

An HTML, XML, SAX and Reader parser for Ruby with the ability to search documents via XPath or CSS3 selectors… and much more

Nokogiri (鋸) is an HTML, XML, SAX and Reader parser for Ruby. Among Nokogiri’s many features is the ability to search documents via XPath or CSS3 selectors.

See the Nokogiri cheat-sheet for tips using Nokogiri.

A digest of most of the methods documented at nokogiri.org. Reading the source can help, too.

From the Nokogiri readme:

XML is like violence - if it doesn’t solve your problems, you are not using enough of it.

3699 questions
22
votes
5 answers

ERROR: While executing gem ... (TypeError) incompatible marshal file format (can't be read)

I encountered this issue when I run bundle install with Ruby version 2.4.4 and macOS Mojave: Fetching nokogiri 1.8.5 Installing nokogiri 1.8.5 with native extensions Gem::Ext::BuildError: ERROR: Failed to build gem native extension. ERROR: cannot…
Rémi JUHE
  • 385
  • 4
  • 12
22
votes
4 answers

Is it possible to 'unload' ('un-require') a Ruby library?

I'm looking to load a few libraries, have them do some work, and then do the opposite of require to avoid compatibility errors later. I don't want to have to dump to a file and restart a shell, as the objects created (such as data) could be…
Louis Maddox
  • 5,226
  • 5
  • 36
  • 66
21
votes
1 answer

Nokogiri leaving HTML entities untouched

I want Nokogiri to leave HTML entities untouched, but it seems to be converting the entities into the actual symbol. For example: Nokogiri::HTML.fragment('

®

').to_s results in: "

®

" Nothing seems to return the original HTML back to…
Richard
  • 1,146
  • 1
  • 13
  • 24
21
votes
3 answers

How to convert Nokogiri Document object into JSON

I have some parsed Nokogiri::XML::Document objects that I want to print as JSON. I can go the route of making it a string, parsing it into a hash, with active-record or Crack and then Hash.to_json; but that is both ugly and depending on way too…
berkes
  • 26,996
  • 27
  • 115
  • 206
21
votes
3 answers

How do I get Nokogiri to add the right XML encoding?

I have created a xml doc with Nokogiri: Nokogiri::XML::Document The header of my file is but I'd expect to have . Is there any options I could use so the encoding appears ?
Luc
  • 16,604
  • 34
  • 121
  • 183
21
votes
2 answers

How to get node text without children?

I use Nokogiri for parse the html page with same content:

Useful text
Useless text

When I call the method page.css('p.parent').text Nokogiri returns 'Useful text Useless text'. But I…
Denis Kreshikhin
  • 8,856
  • 9
  • 52
  • 84
20
votes
3 answers

Nokogiri to_xml without carriage returns

I'm currently using the Nokogiri::XML::Builder class to construct an XML document, then calling .to_xml on it. The resulting string always contains a bunch of spaces, linefeeds and carriage returns in between the nodes, and I can't for the life of…
Cameron
  • 641
  • 1
  • 7
  • 18
20
votes
1 answer

How to check during Nokogiri/Ruby parsing if element exists on page?

how can I check during parsing of an HTML page with Nokogiri (Ruby gem) if an element, in this case a div, exists on the page? On my test page, it does exist, so the pp yields the expected Nokogiri output. But the if statement does not work, the ==…
Chris
  • 235
  • 1
  • 3
  • 5
20
votes
1 answer

What's the difference between .at_css to .css in Nokogiri?

I can't find a clear, direct answer, but what's the difference between .at_css and .css in Nokogiri?
Ariel
  • 2,638
  • 4
  • 23
  • 27
20
votes
2 answers

how to make Nokogiri not to convert   to space

i fetch one html fragment like "
  • 市 场 价" which contains " ", but after calling to_s of Nokogiri NodeSet, it becomes "
  • 市 场 价" , i want to keep the original html fragment, and tried to set :save_with option for to_s method, but…
  • ywenbo
    • 3,051
    • 6
    • 31
    • 46
    20
    votes
    2 answers

    How to "gem install nokogiri -- --use-system-libraries" via Gemfile

    There is a known error installing the latest version of Nokogiri. The workaround is to manually install using gem install nokogiri -- --use-system-libraries But how can this be done via the Gemfile?
    s2t2
    • 2,462
    • 5
    • 37
    • 47
    20
    votes
    12 answers

    Failing to install Nokogiri gem

    I'm working on a rails app that allows for image attachments to each use account. I'm using paperclip and amazon web services: gem 'paperclip' gem 'aws-sdk' When I run bundle install, I get this message: extconf failed, exit code 1 Gem files will…
    Katie H
    • 2,283
    • 5
    • 30
    • 51
    20
    votes
    3 answers

    How do I create XML using Nokogiri::XML::Builder with a hyphen in the element name?

    I am trying to build an XML document using Nokogiri. Some of the elements have hyphens in them. Here's an example: require "nokogiri" builder = Nokogiri::XML::Builder.new do |xml| xml.foo_bar "hello" end puts builder.to_xml Which produces:
    Theozaurus
    • 955
    • 1
    • 8
    • 21
    19
    votes
    2 answers

    Changing href attributes with nokogiri and ruby on rails

    I Have a HTML document with links links, for exemple:
    19
    votes
    3 answers

    Get text directly inside a tag in Nokogiri

    I have some HTML that looks like:
    Hello (2009)
    I already have all my HTML loaded into a variable called record. I need to parse out the year i.e. 2009 if it exists. How can I get the text inside the dt tag but not the…
    Mridang Agarwalla
    • 43,201
    • 71
    • 221
    • 382