1

I am referring to the following question listed on stackoverflow: Scrapy, scrapping data inside a javascript

I am trying to replicate the answer to this question given by @Rho to learn how to scrape data from a javascript-generated form. The payload to the form seems to have changed since this question was posted so I have modified accordingly.

My code and output is as follows:

>>>scrapy shell https://www.mcdonalds.com.sg/locate-us/

2015-07-07 12:09:28+0800 [scrapy] INFO: Scrapy 0.24.6 started (bot: scrapybot)
.....
2015-07-07 12:09:28+0800 [default] INFO: Spider opened
2015-07-07 12:09:32+0800 [default] DEBUG: Crawled (200) <GET https://www.mcdonalds.com.sg/locate-us/> (referer: None)
....
>>> url = 'https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php'
>>> payload = {'action':'store_locator_locations'}
>>> head = {'X-Requested-With':'XMLHttpRequest'}
>>> from scrapy.http import FormRequest
>>> req=FormRequest(url,formdata=payload,headers=head)
>>> fetch(req)
2015-07-07 12:12:24+0800 [default] DEBUG: Crawled (404) <POST https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php> (referer: None)

Expected response is 200, but as you can see above I am getting 404 error code.

Community
  • 1
  • 1
Tuhina Singh
  • 927
  • 2
  • 10
  • 19

1 Answers1

0

This isn't a problem with the code per se. The original question and answer you referred to was from 2013; a lifetime ago on the internet.

Things have changed for McDonald's Singapore, and for Wordpress it would seem. But not all that much.

What used to be

url = 'https://www.mcdonalds.com.sg/wp-admin/admin-ajax.php'

Is now

url = 'https://www.mcdonalds.com.sg/wp/wp-admin/admin-ajax.php'

(I found this by using Chrome F12 developer tools and looking at Network tab)

In fact, you can issue a GET request to this url and get JSON back:

GET

https://www.mcdonalds.com.sg/wp/wp-admin/admin-ajax.php?action=store_locator_locations

[{
    "id": "417",
    "name": "McDonald\u2019s JCube",
    "address": "2 Jurong East Central 1<br\/>#01-09<br\/>JCube\r\n",
    "city": "Singapore",
    "lat": "1.33352",
    "long": "103.740277",
    "op_hours": "Mon-Fri: Opens at 0630<br>\r\nSat-Sun: Opens at 0700<br>\r\nSun-Thur: Closes at 2300 <br>\r\nFri\/Sat & PH Eve: Closes at 0000\r\n<br><br>\r\nDessert Kiosk: Daily 1100 - 2300",
    "phone": "66844228",
    "region": "west",
    "types": ["3"],
    "zip": "609731"
},
...
]
Community
  • 1
  • 1
kaveman
  • 4,339
  • 25
  • 44