2

I need to parse the form to get the value of `IW_SessionID_from the HTML I get back, which I can't get to work.

#!/usr/bin/ruby

require 'pp'
require 'nokogiri'
require 'mechanize'

r = '<HTML><HEAD><TITLE></TITLE><meta http-equiv=\"cache-control\" content=\"no-cache\">\r\n<meta http-equiv=\"pragma\" content=\"no-cache\">\r\n<NOSCRIPT><HTML><BODY>Your browser does not seem to support JavaScript. Please make sure it is supported and activated</BODY></HTML></NOSCRIPT>\r\n<SCRIPT>\r\nvar ie4 = (document.all)? true:false;\r\nvar ns6 = (document.getElementById)? true && !ie4:false;\r\nfunction Initialize() {\r\nvar lWidth;\r\nvar lHeight;\r\nif (ns6) {\r\n  lWidth = window.innerWidth - 30;\r\n  lHeight = window.innerHeight - 30;\r\n} else {\r\n   lWidth = document.body.clientWidth;\r\n   lHeight = document.body.clientHeight;\r\n   if (lWidth == 0) { lWidth = undefined;}\r\n   if (lHeight == 0) { lHeight = undefined;}\r\n}\r\ndocument.forms[0].elements[\"IW_width\"].value = lWidth;\r\ndocument.forms[0].elements[\"IW_height\"].value = lHeight;\r\ndocument.forms[0].submit();\r\n}</SCRIPT></HEAD><BODY onload=\"Initialize()\">\r\n<form method=post action=\"/bwtem\">\r\n<input type=hidden name=\"IW_width\">\r\n<input type=hidden name=\"IW_height\">\r\n<input type=hidden name=\"IW_SessionID_\" value=\"1wqzj1f0vec57r1apfqg51wzs88c\">\r\n<input type=hidden name=\"IW_TrackID_\" value=\"0\">\r\n</form></BODY></HTML>'

page = Nokogiri::HTML r
puts page.css('form[name="IW_SessionID_"]')

a = Mechanize.new
page2 = Mechanize::Page.new(nil,{'content-type'=>'text/html'},r,nil,a)

pp page2.form_with(:name => "IW_SessionID_")

The script just returns nil.

Can anyone figure out how to get the value of IW_SessionID_?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Jasmine Lognnes
  • 6,597
  • 9
  • 38
  • 58

2 Answers2

0

You have to unescape your example HTML string, then search the input with the name IW_SessionID_.

This sample code works for me:

#!/usr/bin/ruby

require 'pp'
require 'nokogiri'
require 'mechanize'

r = '<HTML><HEAD><TITLE></TITLE><meta http-equiv="cache-control" content="no-cache">\r\n<meta http-equiv="pragma" content="no-cache">\r\n<NOSCRIPT><HTML><BODY>Your browser does not seem to support JavaScript. Please make sure it is supported and activated</BODY></HTML></NOSCRIPT>\r\n<SCRIPT>\r\nvar ie4 = (document.all)? true:false;\r\nvar ns6 = (document.getElementById)? true && !ie4:false;\r\nfunction Initialize() {\r\nvar lWidth;\r\nvar lHeight;\r\nif (ns6) {\r\n  lWidth = window.innerWidth - 30;\r\n  lHeight = window.innerHeight - 30;\r\n} else {\r\n   lWidth = document.body.clientWidth;\r\n   lHeight = document.body.clientHeight;\r\n   if (lWidth == 0) { lWidth = undefined;}\r\n   if (lHeight == 0) { lHeight = undefined;}\r\n}\r\ndocument.forms[0].elements["IW_width"].value = lWidth;\r\ndocument.forms[0].elements["IW_height"].value = lHeight;\r\ndocument.forms[0].submit();\r\n}</SCRIPT></HEAD><BODY onload="Initialize()">\r\n<form method=post action="/bwtem">\r\n<input type=hidden name="IW_width">\r\n<input type=hidden name="IW_height">\r\n<input type=hidden name="IW_SessionID_" value="1wqzj1f0vec57r1apfqg51wzs88c">\r\n<input type=hidden name="IW_TrackID_" value="0">\r\n</form></BODY></HTML>'

page = Nokogiri::HTML r
input = page.css('input[name="IW_SessionID_"]').first
puts input[:value]
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Pioz
  • 6,051
  • 4
  • 48
  • 67
0

It's easy to do once you are familiar with the tools:

require 'nokogiri'

doc = Nokogiri::HTML(DATA.read)

doc.at('input[name="IW_SessionID_"]')['value']
# => "1wqzj1f0vec57r1apfqg51wzs88c"

__END__
<HTML>
  <BODY>
    <form method=post action="/bwtem">
      <input type=hidden name="IW_height">
      <input type=hidden name="IW_SessionID_" value="1wqzj1f0vec57r1apfqg51wzs88c">
      <input type=hidden name="IW_TrackID_" value="0">
    </form>
  </BODY>
</HTML>

Don't do things like:

page.css('form[name="IW_SessionID_"]')

css is used to search for multiple elements that match the selector. It's highly unlikely a form would have multiple hidden inputs with the same name, so at would be more sensible. css returns a NodeSet, which is like an Array of Nodes, and, as a result doesn't act like a Node:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<html>
  <body>
    <p>foo</p>
    <p>bar</p>
  </body>
</html>
EOT

doc.search('p').class # => Nokogiri::XML::NodeSet
doc.at('p').class # => Nokogiri::XML::Element

text will concatenate the text elements in the NodeSet resulting in a mess:

doc.search('p').text # => "foobar"

whereas using map(&:text) will iterate over the nodes returning their text:

doc.search('p').map(&:text) # => ["foo", "bar"]

Also note that css(...).first or search(...).first is the same as at or one of its at_* siblings:

doc.search('p').first.to_html # => "<p>foo</p>"
doc.at('p').to_html # => "<p>foo</p>"

So use at instead of search(...).first for clarity.

Finally, strip your HTML sample to the bare minimum necessary to demonstrate the problem you're asking about. Anything beyond that wastes space and our time as we're trying to understand the problem.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303