0

I am opening a webpage with urllib2 and reading its content and then passing it's content to BeautifulSoup and then scraping..
But what I want to first load the page fully or for a specfic time set and then read its content.

I have tried method like time.sleep(sec) but these are not working I am getting the same content either i read it instent or wait(sleep) for 10/15sec. and when I am entering script line by line inot python shell then I am getting different result.

I am using urllib2 and python2.7. Also I tried to find a solution but everyone suggesting to use another module. Is this not possible with Urllib2 or urllib3? Or Do I have to use another module like requests?
please suggest

Sajjan Kumar
  • 353
  • 1
  • 3
  • 16
  • Why would waiting make a difference? It returns the contents of the page, waiting won't make the page change. – Peter Wood Jan 27 '16 at 10:27
  • I have seen the duplicate you are pointing but nothing works and also that was active 3 or 5 year ago. @PeterWood and I found that wait for some time gives time to load the page fully. [http://stackoverflow.com/questions/31310321/python-urllib2-wait-for-page-to-load-to-scrape-data](http://stackoverflow.com/questions/31310321/python-urllib2-wait-for-page-to-load-to-scrape-data) – Sajjan Kumar Jan 27 '16 at 10:51
  • That sleep gives the redirect time to occur. If the page is being modified in the browser using javascript, no amount of waiting will make that happen with `urllib`/`urllib2` as it doesn't process javascript. – Peter Wood Jan 27 '16 at 10:56
  • got it. Can you suggest any other option except Selenium ? – Sajjan Kumar Jan 27 '16 at 11:01
  • See also this question: [Any Python alternatives to Selenium...?](http://stackoverflow.com/questions/2127181/any-python-alternatives-to-selenium-for-programmatically-logging-into-websites-t) – Peter Wood Jan 27 '16 at 11:14
  • duplicate: http://stackoverflow.com/questions/11460105/python-urllib2-wait-for-page-to-finish-loading-redirecting-before-scraping Look at the link I've posted above, your question is a duplicate. Anyway, you can't do that with urllib2/3 since those modules don't have a JS engine, but only GETS the data. – S. Kerdel Jan 27 '16 at 10:30
  • @ProjextHardcore Yes it might be a duplicate and I have seen all those post befor posting this but I haven't got a solution. with selenium and requests module , there was a different problem so i thought urllib2 would be useful. – Sajjan Kumar Jan 27 '16 at 10:44
  • Could you maybe explain what's not working with selenium? Do you maybe have an example? – S. Kerdel Jan 27 '16 at 11:09
  • When I am entering line by line code in python shell it work but When I create a .py file it throw an error " " I searched for it and nothing worked for me (like IE setting, firwall unblocking ] and when I am using requests module it throw an error that unsecure connection and when I am trying to fix this usig certifi and urllib3 it throw the same error. I am trying to avoid this. – Sajjan Kumar Jan 27 '16 at 11:30
  • Could you maybe post the selenium code? If it works in the CLI, it should work as a script too. Are you sure the IP:PORT parameters are correct? – S. Kerdel Jan 27 '16 at 11:37
  • @ProjectHardcore here is [pastebin](http://pastebin.com/jWxZ5hwv) and also look at the bottom of code of explanation – Sajjan Kumar Jan 28 '16 at 07:02
  • @ProjectHardcore I am hoping you suggestion.. – Sajjan Kumar Feb 02 '16 at 07:04
  • @PeterWood review the code and see if you can help me – Sajjan Kumar Feb 02 '16 at 07:05
  • @SajjjanKumar See the comments on [this answer](http://stackoverflow.com/a/9902956/1084416). Is the Selenium server running? – Peter Wood Feb 02 '16 at 07:43

0 Answers0