How can i get fully loaded html through python-mechanize?

Question

Hi I'm using python mechanize to get datas from webpages. I'm trying to get imgurl from google image search webpage to download search result images.

Here's my code I fill search form as 'dog' and submit. (search 'dog')

import mechanize
import cookielib
import urllib2
import urllib

br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time = 1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (x11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1'), ('Accept', '*/*') ,('Accept-Language', 'ko-KR')]

br.open('http://www.google.com/imghp?hl=en')
br.select_form(nr=0)
br.form['q'] = 'dog'
a = br.submit()
searched_url = br.geturl()

file0 = open("1.html", "wb")
file0.write(a.read())
file0.close()

when i see page-source from chrome browser, there are 'imgurl's in pagesource. But when i read data from python mechanize, there's no such things. also, the size of 1.html(which i write by python) is much smaller than html file downloaded from chrome. How can i get exactly same html data as web-browsers by using python?

Do i have to set request headers same as web-browsers? thanks

What you see in the "regular" page is loaded with JavaScript. Look at the AJAX requests sent out by the browser while browsing the page and you'll see how to get the images. — Blender, Dec 28 '13 at 08:10
I have the same problem, and using selenium is not an option for me. Can someone point out how to scrape the page source after the js does its magic?? — gixxer, Jul 19 '16 at 23:36

How can i get fully loaded html through python-mechanize?

0 Answers0