0

I'm trying to scrape data from http://apps.cu-citizenaccess.org/

It looks like they are trying to extract data from JSON, so I used code similar to that recommended in Scrapy, scrapping data inside a javascript.

My current code (using Python 3) is

jsonresponse = json.loads(response.body_as_unicode())
print(jsonresponse["val.restname"])

I was wondering whether it's an error in technique, or whether I should be doing else entirely?

Community
  • 1
  • 1
wwl
  • 2,025
  • 2
  • 30
  • 51
  • Wouldn't it be _much_ simpler to just make a web request to the JSON file at http://apps.cu-citizenaccess.org/restaurants/api/restaurants/?format=json&rest_closed=False ? – Benjamin Gruenbaum Apr 11 '15 at 23:21
  • I'll look into this. Sorry, I'm still very new to scraping. – wwl Apr 11 '15 at 23:28
  • Thank you so much. You saved me from groping in the dark. – wwl Apr 11 '15 at 23:30
  • I'm glad I could help, in general in the future it's best not to post links to actual sites but to create a minimal code sample. Consider posting an answer to your own question explaining how you solved the issue. Welcome to the site. – Benjamin Gruenbaum Apr 11 '15 at 23:31
  • Yes I will post a comprehensive summary. Do you mind letting me know how you found the link you posted? – wwl Apr 11 '15 at 23:47
  • Using the network tab in the chrome developer tools http://discover-devtools.codeschool.com/ – Benjamin Gruenbaum Apr 11 '15 at 23:49

1 Answers1

1

The simplest way is to access the actual JSON file at http://apps.cu-citizenaccess.org/restaurants/api/restaurants/?format=json&rest_closed=False ?

This can be located by using the network tab in Google Chrome's developer tools.

Initially the page may display 20 entries only. So you can add a parameter: "limit = 1000". Thereafter you can add "offset=1000" to display the remaining entries.

Then use a JSON to CSV converter to get both pages into CSV format if needed. Both CSV files can easily be merged with a program such as Excel.

wwl
  • 2,025
  • 2
  • 30
  • 51