2

I have a page that logs into a form. After logging in there are a few redirects. The first one looks like this:

#<Mechanize::File:0x1f4ff23 @filename="MYL.html", @code="200", @response={"cache-control"=>"no-cache=\"set-cookie\"", "content-length"=>"114", "set-cookie"=>"JSESSIONID=GdJnPVnhtN91KZfQPc3QzM1NLCyWDsnyvpGg8LL0Knnz3RgqxLFs!1803804592!-2134626567; path=/; secure, COOKIE_TEST=Aslyn; secure", "x-powered-by"=>"Servlet/2.4 JSP/2.0"}, @body="\r\n<html>\r\n  <head>\r\n    <meta http-equiv=\"refresh\" content=\"0;URL=MYL?Select=OK&StateName=38\">\r\n  </head>\r\n</html>", @uri=#<URI::HTTPS:0x16e1eff URL:https://www.manageyourloans.com/MYL?StateName=global_CALMLandingPage&GUID=D1704621-1994-E076-460A-10B2B682B960>>

so when I do a page.class here I get

Mechanize::File

How do I convert that to a Mechanize::Page?


@pguardiario

To better explain I have the code in my original message is stored in page.

When I do page.class I get Mechanize::File

So then I excute your code above:

agent = Mechanize.new
agent.post_connect_hooks << lambda {|http| http[:response].content_type = 'text/html'}

So I do this: agent.get(page.uri.to_s) or event try with any url agent.get("https://www.manageyourloans.com/MYL") I get an error: ArgumentError: wrong number of arguments (4 for 1)

I've even tried this:

agent = Mechanize.new { |a|
  a.post_connect_hooks << lambda { |_,_,response,_|
    if response.content_type.nil? || response.content_type.empty?
      response.content_type = 'text/html'
    end
  }
}

My question is once I do this, how do I convert the previous page into a Mechanize::Page?

Brad Larson
  • 170,088
  • 45
  • 397
  • 571
user1198316
  • 267
  • 2
  • 3
  • 12

2 Answers2

3

You can convert from a Mechanize::File to a Mechanize::Page by taking the body contained in the file object and passing that in as the body of a new page:

irb(main):001:0> require 'mechanize'
true
irb(main):002:0> file = Mechanize::File.new(URI.parse('http://foo.com'),nil,File.read('foo.html'))
#<Mechanize::File:0x100ef0190
    @full_path = false,
    attr_accessor :body = "<html><body>foo</body></html>\n",
    attr_accessor :code = nil,
    attr_accessor :filename = "index.html",
    attr_accessor :response = {},
    attr_accessor :uri = #<URI::HTTP:0x100ef02d0
        attr_accessor :fragment = nil,
        attr_accessor :host = "foo.com",
        attr_accessor :opaque = nil,
        attr_accessor :password = nil,
        attr_accessor :path = "",
        attr_accessor :port = 80,
        attr_accessor :query = nil,
        attr_accessor :registry = nil,
        attr_accessor :scheme = "http",
        attr_accessor :user = nil,
        attr_reader :parser = nil
    >
>

First, I created a fake Mechanize::File object just to have one for the example code to follow. You can see the content of the file it read in the :body.

Mechanize creates a Mechanize::File object when it can't figure out what the true content-type is.

irb(main):003:0> page = Mechanize::Page.new(URI.parse('http://foo.com'),nil,file.body)
#<Mechanize::Page:0x100ed5e30
    @full_path = false,
    @meta_content_type = nil,
    attr_accessor :body = "<html><body>foo</body></html>\n",
    attr_accessor :code = nil,
    attr_accessor :encoding = nil,
    attr_accessor :filename = "index.html",
    attr_accessor :mech = nil,
    attr_accessor :response = {
        "content-type" => "text/html"
    },
    attr_accessor :uri = #<URI::HTTP:0x100ed5ed0
        attr_accessor :fragment = nil,
        attr_accessor :host = "foo.com",
        attr_accessor :opaque = nil,
        attr_accessor :password = nil,
        attr_accessor :path = "",
        attr_accessor :port = 80,
        attr_accessor :query = nil,
        attr_accessor :registry = nil,
        attr_accessor :scheme = "http",
        attr_accessor :user = nil,
        attr_reader :parser = nil
    >,
    attr_reader :bases = nil,
    attr_reader :encodings = [
        [0] nil,
        [1] "US-ASCII"
    ],
    attr_reader :forms = nil,
    attr_reader :frames = nil,
    attr_reader :iframes = nil,
    attr_reader :labels = nil,
    attr_reader :labels_hash = nil,
    attr_reader :links = nil,
    attr_reader :meta_refresh = nil,
    attr_reader :parser = nil,
    attr_reader :title = nil
>
irb(main):004:0> page.class
Mechanize::Page < Mechanize::File

Just pass in the body of the file object and let Mechanize convert to what you know it should be.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • I am working through this answer and I am using this: `code`page = Mechanize::Page.new(URI.parse(page.uri.to_s),nil,page.body)`code`. I get an error: undefined method `[]' for nil:NilClass – user1198316 Apr 24 '12 at 12:11
0

I like @The Tin Man's answer but it might be simpler to force the content type of the response:

agent.post_connect_hooks << lambda {|http| http[:response].content_type = 'text/html'}
pguardiario
  • 53,827
  • 19
  • 119
  • 159
  • When I do this in irb I get: undefined method `post_connect_hooks' for # – user1198316 Apr 24 '12 at 11:57
  • In my answer agent references a Mechanize object which you can instantiate with 'Mechanize.new' – pguardiario Apr 24 '12 at 12:01
  • agent = Mechanize.new agent.post_connect_hooks << lambda {|http| http[:response].content_type = 'text/html'}. Reading up on this it says that A list of hooks to call after retrieving a response. Hooks are called with the agent and the response returned. So I would do it after I have my Mechanize::File, correct? So if I then do agent.get(urlofpagehere), should that return the Mechanize::Page? – user1198316 Apr 24 '12 at 13:10
  • I'm not sure I understand but if you do that you will get a Mechanize::Page instead of a Mechanize::File, yes. – pguardiario Apr 24 '12 at 14:03