How to Scrape All posts of hashtag in instagram

Question

I want to Scrape All the Post Containing some #hashtag from Instagram

I tried it from : https://www.instagram.com/explore/tags/perfume/?__a=1

But It's only giving some posts not every post.

I believe they have an API, another way would be scrapy +Splash (with a different starting URL) — wishmaster, Feb 08 '20 at 14:34
Does this answer your question? [How to get ALL Instagram POSTs by hashtag with the API (not only the posts of my own account)](https://stackoverflow.com/questions/43655098/how-to-get-all-instagram-posts-by-hashtag-with-the-api-not-only-the-posts-of-my). Take a look at this [answer](https://stackoverflow.com/a/48682863/3091398) — CodeIt, Feb 08 '20 at 14:36

Manuel Fedele · Accepted Answer · 2020-02-26T16:06:36.670

Look carefully at the json you receive.

Navigate to graphql -> hashtag -> edge_hashtag_to_media -> page_info -> end_cursor

That's the identifier you have to use to specify the next batch of medias, like this:

https://www.instagram.com/explore/tags/perfume/?__a=1&max_id=QVFDNWJDZnpGbElpdEV5Q19aaldYWUsxZnc1YUd0Z21yNUZsOWw4V2NxX05ZWnZjT2pRb3lrY29ocDJnM0VNallUWGZVeDIxVURnUzltdHpBR1A1a0VRNw==

You can iterate this process to get more medias for requested hashtag.

A naive example with requests (python3) to extract first 10 batches.

import requests
import json
from time import sleep

max_id = ''

base_url = "https://www.instagram.com/explore/tags/perfume/?__a=1"
for i in range(0, 10):
    sleep(2) # Be polite.

    if max_id:
        url = base_url + f"&max_id={max_id}"
    else:
        url = base_url

    print(f"Requesting {url}")
    response = requests.get(url)
    response = json.loads(response.text)
    try:
        max_id = response['graphql']['hashtag']['edge_hashtag_to_media']['page_info']['end_cursor']
        print(f"New cursor is {max_id}")
    except KeyError:
        print("There's no next page!")
        break

As said in comment, be polite. Instagram will block you if you shoot too many requests per second.

you should pass header to request.get() to avoid getting 429 error. header looks like : headers = { "user-agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36 Edg/87.0.664.57", "cookie":"sessionid=YOUR_SESSION_ID;" } — parvaneh shayegh, May 10 '21 at 14:21

score 1 · Answer 2 · answered Apr 21 '23 at 02:11

The endpoint ?__a=1 doesn't work anymore. When it worked, it provided like 20 or 40 posts and the next page URL. Meaning that we had to make a sequence of calls until getting all posts or being rate-limited by the website.

Nowadays, there are services like this where folks offer access to a variety of non-official APIs for various social media: https://rapidapi.com/search/instagram
Some APIs offer a small amount of calls for free (say, 50/month) and all have paid plans for larger number of calls (say, 20k/day for n dollars).

Same as with past methods, the response has x number of posts and we have to keep loading the next posts.

score 0 · Answer 3 · answered Feb 17 '20 at 23:30

You can use this library https://github.com/postaddictme/instagram-php-scraper/blob/master/examples/getMediasByTag.php

The function require a number of media as a parameter so if you want to recover all the media of a hashtags you will have to get the value of "graphql->hashtag->edge_hashtag_to_media->count" on the JSON feed https://www.instagram.com/explore/tags/perfume/?__a=1

How to Scrape All posts of hashtag in instagram

3 Answers3