0

I am trying to communicate with a badly designed web server, but still I want to deal with it. The thing is, when I submit my login form, it tries to embed messages inside the URI, which makes the URI library stop.

The server redirects me to

/path/ConvolutedNameForMenuPage.menu?name=bmenu.P_MainMnu&msg=WELCOME+<b>Welcome,+Jonathan+Allard,+to+our+poorly+designed+Administrative+Systems!<%2Fb>Dec+07,+201102%3A27+PM

That's right, it's trying to pass me unparsed HTML code inside the redirect URI, which I am supposed to request in order to get it back. Sheesh, standards!

And now the URI library, visibly passionately upset by such bad practice, exclaims

URI::InvalidURIError: bad URI(is not URI?): /path/ConvolutedNameForMenuPage.menu?name=bmenu.P_MainMnu&msg=WELCOME+<b>Welcome,+Jonathan+Allard,+to+our+poorly+designed+Administrative+Systems!<%2Fb>Dec+07,+201102%3A27+PM   from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/1.9.1/uri/generic.rb:1202:in `rescue in merge'
from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/1.9.1/uri/generic.rb:1199:in `merge'
from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/mechanize-2.0.1/lib/mechanize/page/meta_refresh.rb:32:in `parse'
from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/mechanize-2.0.1/lib/mechanize/page/meta_refresh.rb:41:in `from_node'
from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/mechanize-2.0.1/lib/mechanize/page.rb:282:in `block in meta_refresh'
from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0/lib/nokogiri/xml/node_set.rb:239:in `block in each'
from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0/lib/nokogiri/xml/node_set.rb:238:in `upto'
from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0/lib/nokogiri/xml/node_set.rb:238:in `each'

I feel your pain, URI lib.

Now, how do I catch this, parse back the URI correctly (or just drop it altogether) and submit back as if nothing happened? Or is this a bug somewhere between URI and Mechanize?

Jonathan Allard
  • 18,429
  • 11
  • 54
  • 75
  • Just put the request in a rescue block? Maybe you should show some code. – pguardiario Dec 07 '11 at 23:40
  • @pguardiario It doesn't get much more complicated than this. It's basically `uri.merge('<')` and I'm stuck in a redirect. Also, there's not much to rescue as the URI throws an error every time. – Jonathan Allard Dec 08 '11 at 00:50
  • So you're calling uri.merge('<') directly? Why are you doing that? If you show the code we won't have to guess what you're talking about. – pguardiario Dec 08 '11 at 01:13
  • @pguardiario No, the server redirects me to a path containing a "<" and it crashes URI – Jonathan Allard Dec 08 '11 at 01:53
  • Maybe Mechanize#redirect_ok = false will help, I really can't offer anything else without seeing some code. – pguardiario Dec 08 '11 at 02:13

1 Answers1

0

After some digging in the code, I have found where the issue comes from.

As I explained in #177:

in /lib/mechanize/page/meta_refresh.rb:40

class Mechanize::Page::MetaRefresh

def self.parse content, base_uri
  return unless content =~ CONTENT_REGEXP

  delay, refresh_uri = $1, $3

  dest = base_uri
  dest += refresh_uri if refresh_uri     # Oops!

  return delay, dest
end

The referenced line will raise URI::InvalidURIError if refresh_uri contains illegal symbols (such as <). I don't quite know where the sanitize should be done though.

The URI#merge of my error log is hidden in the += operator on the oops line, if you wondered.

Jonathan Allard
  • 18,429
  • 11
  • 54
  • 75