0

I am now woking on a sina weibo crawler using its api. In order to use api, I have to access oauth2 authorizing page to retrive the code from url.

This is exactly how I do:

  1. Use my app_key and app_secret (both known)

  2. get the url of oauth2 webpage

  3. copy and paste the code from Respond URL manually.

This is my code:

#call official SDK
client = APIClient(app_key=APP_KEY, app_secret=APP_SECRET, redirect_uri=CALLBACK_URL)

#get url of callback page of authorization
url = client.get_authorize_url()
print url

#open webpage in browser
webbrowser.open_new(url)

#after the webpage responding, parse the code part in the url manually
print 'parse the string after 'code=' in url:'
code = raw_input()

My Question is exactly how to get rid of the manually parsing part?

Reference: http://blog.csdn.net/liuxuejiang158blog/article/details/30042493

shin
  • 671
  • 6
  • 10
  • Look into [`requests`](http://docs.python-requests.org/en/master/) module –  Jun 07 '17 at 10:17

1 Answers1

0

To get the contents of a page using requests, you can do like this

import requests

url = "http://example.com"

r = requests.get(url)

print r.text

You can see details of the requests library here. You can use pip to install it into your virtualenv / python dist.

For writing crawler, you can also use scrapy.

And finally, I did not understand one thing, if you have a official client then why do you need to parse the contents of an URL to get data. Doesn't the client give you data using some nice and easy to use functions?

SRC
  • 2,123
  • 3
  • 31
  • 44
  • Ah ok! did not understand that from your primary question. Did you try using [selenium](http://www.seleniumhq.org/) They have a [python binding](https://selenium-python.readthedocs.io/) too ? – SRC Jun 08 '17 at 08:07
  • I will try that and get back to you, Thank you again ! – shin Jun 08 '17 at 10:06
  • You are welcome! Hope it will help you. If you are not restricted to python then you can look into [PhantomJS](http://phantomjs.org/) as well. – SRC Jun 08 '17 at 13:38