0

I am checking an Instagram page existence by urlopen('https://www.instagram.com/profile-name'). Getting a profile page when it exists, and 404 error if not. That's a perfect flow.

But the Instagram request limit is reached fast. It is per-ip, so I need to change IP. For this I've tried Tor. And... it gets broken, when I start doing urlopen() through Tor connection - getting the Instagram login page disregarding profile existence, so I cannot distinct existing/non-existing profiles. What may be reason for such behavior and how to fix it?

Here is the sample code. Run in python3. USE_TOR constant will switch Tor on/off. To install socks run in terminal pip3 install requests requests[socks] and pip3 install pysocks.

You need to install Tor before use it.

import urllib.request
from urllib.error import HTTPError
import socks
import socket

USE_TOR = True

def createConnection(address, timeout = None, source_address = None):
    sock = socks.socksocket()
    sock.connect(address)
    return sock

def getIp():
    with urllib.request.urlopen("http://httpbin.org/ip") as page:
        return str(page.read()).replace('\n', '')

#

print("Normal IP: " + getIp())

# Set up tor

if USE_TOR:
    socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050)
    socket.socket = socks.socksocket
    socket.create_connection = createConnection
    print("Tor IP: " + getIp())

# Request page

try:
    page = urllib.request.urlopen('https://www.instagram.com/a')
    print("Profile exists")
except HTTPError as e:
    print("Profile does not exist. Http error " + str(e.code))

Terminal output:

USE_TOR = True

Normal IP: b'{\n  "origin": "my ip"\n}\n'
Tor IP: b'{\n  "origin": "158.174.122.199, 158.174.122.199"\n}\n'
Profile exists

USE_TOR = False

Normal IP: b'{\n  "origin": "my ip"\n}\n'
Profile does not exist. Http error 404

*"my ip" differs from the Tor one.

Mr. Goldberg
  • 101
  • 9
  • please include your terminal output too – Amit Sep 04 '19 at 07:56
  • @Amit output added – Mr. Goldberg Sep 04 '19 at 08:07
  • @SaSha What package is `import socks`? When trying to test your code, I get a *ModuleNotFoundError*. – Lord Elrond Sep 04 '19 at 21:20
  • @CalebGoodman To install socks run in terminal: `pip3 install requests requests[socks]` and `pip3 install pysocks` – Mr. Goldberg Sep 06 '19 at 19:31
  • @SaSha I have all of those packages installed, but I still get the same error. Are you sure that `import socks` works on your end? I'm thinking it might be `from – Lord Elrond Sep 06 '19 at 21:03
  • @CalebGoodman 100% sure. This script is running for me, copied & tested just now. Maybe it require some extra installation: https://stackoverflow.com/questions/14820453/how-do-i-install-socks-socksipy-on-ubuntu (can't remember). – Mr. Goldberg Sep 07 '19 at 02:35

1 Answers1

0

Try loading profile with instaloader. If you dont get error profile exists. You can use try catch.

#instagram.py
from instaloader import Instaloader
from instaloader import Profile

L = Instaloader()
profile = Profile.from_username(L.context, "amit")
#output <Profile amit (27235560)>

profile = Profile.from_username(L.context, "dasjkhkdhsjkahdjkashdadkajksdha")
#yields error.so you know profile doesnot exist

Amit
  • 911
  • 8
  • 23
  • Thanks. But this not helps. This code is redirected to Instagram login page, disregarding profile exists or not. So I cannot distinguish existing/non-existing profiles. – Mr. Goldberg Sep 04 '19 at 20:39
  • Check the edit. You can achieve that with instaloader . https://instaloader.github.io/as-module.html#profiles – Amit Sep 05 '19 at 11:39
  • Will not help, as https://instaloader.github.io/troubleshooting.html#too-many-requests this library also encounter 'too many requsts', same as urllib I am currently using. Instagram just sends me login page when limit is reached. The main question is how to cheat the limit. It is per-ip, so I need to change IP. I know how to change IP using Tor or proxy, BUT in this case instagram instantly redirects me to the login page, instead of user page or 404. My goal is to resolve exactly Tor/proxy problem. – Mr. Goldberg Sep 06 '19 at 19:25
  • Also, `requests` library somehow started returning me real user pages, so I deleted info that `requests` does not work. – Mr. Goldberg Sep 06 '19 at 19:39
  • I use instaloader for a lot of projects.I normally have never had any issue. I guess you need to send large number of requests. If you want to change proxy for each request or maybe rotate proxy. You can setup http proxy in environment variables. instaloader uses "requests" library under the hood. So it it will use the proxy from environment variables. I guess you can try that approach. – Amit Sep 09 '19 at 04:50
  • It gives 500 requests/day or so. I'll try set up env variable. – Mr. Goldberg Sep 09 '19 at 19:33