Wait until page is fully loaded and then reading its content with urllib2/3

Question

I am opening a webpage with urllib2 and reading its content and then passing it's content to BeautifulSoup and then scraping..
But what I want to first load the page fully or for a specfic time set and then read its content.

I have tried method like time.sleep(sec) but these are not working I am getting the same content either i read it instent or wait(sleep) for 10/15sec. and when I am entering script line by line inot python shell then I am getting different result.

I am using urllib2 and python2.7. Also I tried to find a solution but everyone suggesting to use another module. Is this not possible with Urllib2 or urllib3? Or Do I have to use another module like requests?
please suggest

Why would waiting make a difference? It returns the contents of the page, waiting won't make the page change. — Peter Wood, Jan 27 '16 at 10:27
I have seen the duplicate you are pointing but nothing works and also that was active 3 or 5 year ago. @PeterWood and I found that wait for some time gives time to load the page fully. [http://stackoverflow.com/questions/31310321/python-urllib2-wait-for-page-to-load-to-scrape-data](http://stackoverflow.com/questions/31310321/python-urllib2-wait-for-page-to-load-to-scrape-data) — Sajjan Kumar, Jan 27 '16 at 10:51
That sleep gives the redirect time to occur. If the page is being modified in the browser using javascript, no amount of waiting will make that happen with `urllib`/`urllib2` as it doesn't process javascript. — Peter Wood, Jan 27 '16 at 10:56
See also this question: [Any Python alternatives to Selenium...?](http://stackoverflow.com/questions/2127181/any-python-alternatives-to-selenium-for-programmatically-logging-into-websites-t) — Peter Wood, Jan 27 '16 at 11:14
duplicate: http://stackoverflow.com/questions/11460105/python-urllib2-wait-for-page-to-finish-loading-redirecting-before-scraping Look at the link I've posted above, your question is a duplicate. Anyway, you can't do that with urllib2/3 since those modules don't have a JS engine, but only GETS the data. — S. Kerdel, Jan 27 '16 at 10:30
@ProjextHardcore Yes it might be a duplicate and I have seen all those post befor posting this but I haven't got a solution. with selenium and requests module , there was a different problem so i thought urllib2 would be useful. — Sajjan Kumar, Jan 27 '16 at 10:44
Could you maybe explain what's not working with selenium? Do you maybe have an example? — S. Kerdel, Jan 27 '16 at 11:09
When I am entering line by line code in python shell it work but When I create a .py file it throw an error " " I searched for it and nothing worked for me (like IE setting, firwall unblocking ] and when I am using requests module it throw an error that unsecure connection and when I am trying to fix this usig certifi and urllib3 it throw the same error. I am trying to avoid this. — Sajjan Kumar, Jan 27 '16 at 11:30
Could you maybe post the selenium code? If it works in the CLI, it should work as a script too. Are you sure the IP:PORT parameters are correct? — S. Kerdel, Jan 27 '16 at 11:37
@ProjectHardcore here is [pastebin](http://pastebin.com/jWxZ5hwv) and also look at the bottom of code of explanation — Sajjan Kumar, Jan 28 '16 at 07:02
@SajjjanKumar See the comments on [this answer](http://stackoverflow.com/a/9902956/1084416). Is the Selenium server running? — Peter Wood, Feb 02 '16 at 07:43

Wait until page is fully loaded and then reading its content with urllib2/3

0 Answers0