0

I am facing this issue where when I access the page source of a url from my local machine it works fine but when I run the same piece on code on a heroku machine it shows access denied.

I have tried changing the headers ( like adding Referers or changing the User-Agent) but none of those solutions are working.

LOCAL MACHINE

~/Development/repos/eater-list  master  python manage.py shell            1 ↵  12051  21:15:32
>>> from accounts.zomato import *
>>> z = ZomatoAPI()
>>> response = z.page_source(url='https://www.zomato.com/ncr/the-immigrant-cafe-khan-market-new-delhi')
>>> response[0:50]
'<!DOCTYPE html>\n<html  lang="en"  prefix="og: http'
>>> response[0:100]
'<!DOCTYPE html>\n<html  lang="en"  prefix="og: http://ogp.me/ns#" >\n<head>\n    <meta charset="utf-8"

REMOTE MACHINE

~ $ python manage.py shell
Python 3.5.7 (default, Jul 17 2019, 15:27:27)
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from accounts.zomato import *
>>> z = ZomatoAPI()
>>> response = z.page_source(url='https://www.zomato.com/ncr/the-immigrant-cafe-khan-market-new-delhi')
>>> response
'<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD><BODY>\n<H1>Access Denied</H1>\n \nYou don\'t have permission to access "http&#58;&#47;&#47;www&#46;zomato&#46;com&#47;ncr&#47;the&#45;immigrant&#45;cafe&#45;khan&#45;market&#45;new&#45;delhi" on this server.<P>\nReference&#32;&#35;18&#46;56273017&#46;1572225939&#46;46ec5af\n</BODY>\n</HTML>\n'
>>>

ZOMATO API CODE

There is no change in headers or requests version.

class ZomatoAPI:
    def __init__(self):
        self.user_key = api_key
        self.headers = {
            'Accept': 'application/json',
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) '
                          'Chrome/77.0.3865.90 Safari/537.36',
            'user-key': self.user_key}

    def page_source(self, url):
        fng = requests.session()
        page_source = fng.get(
            url, headers=self.headers).content.decode("utf-8")
        return page_source

Will appreciate some advice on it.

python
  • 4,403
  • 13
  • 56
  • 103

1 Answers1

1

Check the response HTTP status code. It might be that Heroku's IP is simply banned from Zomato. This is more common than one might believe -- services like Cloudflare will a lot of times put common IP's in a "banned list".

Here is what you should be looking for regarding HTTP status code to give you more context.

felipe
  • 7,324
  • 2
  • 28
  • 37
  • Going to upload ssl ceriticate in the server and then check. – python Oct 28 '19 at 17:57
  • That might be a reason as well. If so, your `HTTP` status code for a non-SSL request to an SSL source will typically result in `301` (Permanently Moved). – felipe Oct 28 '19 at 18:02
  • I am getting 403 status code and even after adding ssl certificate. – python Oct 29 '19 at 16:09
  • On second look, it doesn't seem like you are properly using their `api`. According to this [here](https://developers.zomato.com/documentation#!/restaurant/restaurant_0), your request will be going to a `https://developers.zomato.com/api/v2.1/` link. Notice that if you remove the `user_key` in your `ZomatoApi()` class and run it locally, it will still be able to pull the page. – felipe Oct 29 '19 at 16:35
  • 1
    My guess is they have blocked AWS IPs. I am going try using proxy through requests – python Oct 29 '19 at 16:47
  • Let me know how it goes -- interested in your discovery. – felipe Oct 29 '19 at 17:11
  • using proxy worked :) I think they have a strong anti-scraping mechanism and blocked AWS IPs. – python Oct 29 '19 at 17:17
  • Ah, nice! Glad you found the issue & it’s working now. – felipe Oct 29 '19 at 17:28