1

Does anyone have a problem with Nokogiri acting differently between two servers, staging, and production?

On staging, it grabs and returns the page properly using Nokogiri 1.4.2 and Mechanize 1.0.0.

On production, it returns a much smaller set of HTML that looks like a canned message using Nokogiri 1.4.2 and Mechanize 1.0.0.

I found out by running it in IRB.

Any clue will be helpful.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Jerry Deng
  • 467
  • 4
  • 15
  • 1
    It probably isn't nokogiri, but the HTTP/socket library that you are using. What do you mean by a "canned message"? – Adrian Jul 23 '10 at 17:37
  • May not help, but I would double check the libxml versions on both servers. In addition, are there any proxies/firewalls/etc.. between Production & the site you are trying to access? – Brian Jul 23 '10 at 18:12
  • There's no proxy/firewalls. It may be the libxml version. I am checking it now. I originally thought it was a "canned message". But now that I think about it, it's more like an much shorter version of the actual html. – Jerry Deng Jul 23 '10 at 21:23
  • It actually has a cut off version of the HTML returned to me. Maybe a package limit on the production server TCP filter? – Jerry Deng Jul 23 '10 at 21:44
  • I doubled check the return html by running .to_html and it turns out to be a shortened version of the full page. Not sure why. – Jerry Deng Jul 26 '10 at 14:18
  • 1
    It's important to understand that Nokogiri doesn't "get" anything when you connect. Either Mechanize or OpenURI or something similar is responsible to actually grab the content from the server, and that then passes the content to Nokogiri for parsing. Nokogiri only knows about strings or IO streams. In the second case it only does a `read` on them, so it's not actively handling the connection. Knowing this is the first step in determining where a problem lies; If one system isn't sending the same data as another, it's that system, not Nokogiri. – the Tin Man Jul 08 '14 at 22:07
  • "I doubled check the return html by running .to_html and it turns out to be a shortened version of the full page." Odds are good the HTML is malformed and Nokogiri is having to do fixups to make it valid. After parsing check the `errors` method on the DOM object returned. – the Tin Man Aug 13 '14 at 22:52
  • Honestly, I don't even recall how I can reproduce the bug. It's been 4 years. Thanks though. – Jerry Deng Aug 14 '14 at 23:31

0 Answers0