Error when scraping Instagram media, by adding at the end of URL (?__a=1)

Question

Sometimes when trying to scrape Instagram media, by adding at the end of the URL (?__a=1)

EX: https://www.instagram.com/p/CP-Kws6FoRS/?__a=1

The response returned

{
    "__ar": 1,
    "error": 1357004,
    "errorSummary": "Sorry, something went wrong",
    "errorDescription": "Please try closing and re-opening your browser window.",
    "payload": null,
    "hsrp": {
        "hblp": {
            "consistency": {
                "rev": 1005622141
            }
        }
    },
    "lid": "7104767527440109183"
}

Why is this response returned and what should I do to fix this? Also, did we have another way to get the video and photo URL?

score 31 · Answer 1 · edited Jul 10 '22 at 13:52

31

I solved this problem by adding &__d=dis to the query string at the end of the URL, like so: https://www.instagram.com/p/CFr6G-whXxp/?__a=1&__d=dis

edited Jul 10 '22 at 13:52

CertainPerformance

356,069
52
309
320

answered Jun 11 '22 at 07:12

nasser

337
2
3

1

Adding onto this: From a quick test, it looks to me like `__d` just needs to be set to a non-empty value to get it working again. – LewsTherinTelescope Jun 22 '22 at 22:21
This answer stop working for me. May be you have new solution @nasser? Thank you for your answer! – bartwader Aug 16 '22 at 14:50
Where did you come with this link and parameters? – ENSATE Jan 09 '23 at 00:39

score 5 · Answer 2 · answered Jun 09 '22 at 01:47

I believe I may found a workaround using:

https://i.instagram.com/api/v1/users/web_profile_info/?username={username} to get the user's info and recent posts. data.user from the response is the same as graphql.user from https://i.instagram.com/{username}/?__a=1.
Extract the media id from <meta property="al:ios:url" content="instagram://media?id={media_id}"> in the HTML response of https://instagram.com/p/{post_shortcode}.
https://i.instagram.com/api/v1/media/{media_id}/info using the extracted media id to get the same response as https://instagram.com/p/{post_shortcode}/?__a=1.

A couple important of points:

The user-agent used in the script is important. I found the one Firefox generated when re-sending requests in the dev tools returned the "Sorry, something went wrong" error.
This solution uses cookies from your Firefox profile. You need to sign in to Instagram in Firefox before running this script. You can switch Firefox to Chrome if you'd like.

cookiejar = browser_cookie3.chrome(domain_name='instagram.com')

Here's the full script. Let me know if this is helpful!

import os
import pathlib
import string
from datetime import datetime, timedelta
from urllib.parse import urlparse
import bs4 as bs
import browser_cookie3
from google.auth.transport import requests
import requests

# setup.
username = "<username>"
output_path = "C:\\some\\path"
headers = {
    "User-Agent": "Mozilla/5.0 (Linux; Android 9; GM1903 Build/PKQ1.190110.001; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/75.0.3770.143 Mobile Safari/537.36 Instagram 103.1.0.15.119 Android (28/9; 420dpi; 1080x2260; OnePlus; GM1903; OnePlus7; qcom; sv_SE; 164094539)"
}


def download_post_media(post: dict, media_list: list, number: int):
    output_filename = f"{output_path}/{username}"
    if not os.path.isdir(output_filename):
        os.mkdir(output_filename)
    post_time = datetime.fromtimestamp(int(post["taken_at_timestamp"])) + timedelta(hours=5)
    output_filename += f"/{username}_{post_time.strftime('%Y%m%d%H%M%S')}_{post['shortcode']}_{number}"
    current_media_json = media_list[number - 1]
    if current_media_json['media_type'] == 1:
        media_type = "image"
        media_ext = ".jpg"
        media_url = current_media_json["image_versions2"]['candidates'][0]['url']
    elif current_media_json['media_type'] == 2:
        media_type = "video"
        media_ext = ".mp4"
        media_url = current_media_json["video_versions"][0]['url']
    output_filename += media_ext
    response = send_request_get_response(media_url)
    with open(output_filename, 'wb') as f:
        f.write(response.content)


def send_request_get_response(url):
    cookiejar = browser_cookie3.firefox(domain_name='instagram.com')
    return requests.get(url, cookies=cookiejar, headers=headers)


# use the /api/v1/users/web_profile_info/ api to get the user's information and its most recent posts.
profile_api_url = f"https://i.instagram.com/api/v1/users/web_profile_info/?username={username}"
profile_api_response = send_request_get_response(profile_api_url)
# data.user is the same as graphql.user from ?__a=1.
timeline_json = profile_api_response.json()["data"]["user"]["edge_owner_to_timeline_media"]
for post in timeline_json["edges"]:
    # get the HTML page of the post.
    post_response = send_request_get_response(f"https://instagram.com/p/{post['node']['shortcode']}")
    html = bs.BeautifulSoup(post_response.text, 'html.parser')
    # find the meta tag containing the link to the post's media.
    meta = html.find(attrs={"property": "al:ios:url"})
    media_id = meta.attrs['content'].replace("instagram://media?id=", "")
    # use the media id to get the same response as ?__a=1 for the post.
    media_api_url = f"https://i.instagram.com/api/v1/media/{media_id}/info"
    media_api_response = send_request_get_response(media_api_url)
    media_json = media_api_response.json()["items"][0]
    media = list()
    if 'carousel_media_count' in media_json:
        # multiple media post.
        for m in media_json['carousel_media']:
            media.append(m)
    else:
        # single media post.
        media.append(media_json)
    media_number = 0
    for m in media:
        media_number += 1
        download_post_media(post['node'], media, media_number)

More info about this API (`i.instagram.com`) can be found here: https://stackoverflow.com/questions/43452544/what-is-https-i-instagram-com-api-v1 — Flimtix, Jun 09 '22 at 07:49
`https://i.instagram.com/api/v1/media/{media_id}/info` - doesn't work for me. It returns `{"message":"useragent mismatch","status":"fail"}` — Amit Sharma, Jun 10 '22 at 20:20
I can confirm this is working with javascript fetch function. This will return with above message when access through browser directly. — Amit Sharma, Jun 10 '22 at 21:54

score 2 · Answer 3 · answered Jun 11 '22 at 03:47

User-Agent:

Mozilla/5.0 (Linux; Android 9; GM1903 Build/PKQ1.190110.001; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/75.0.3770.143 Mobile Safari/537.36 Instagram 103.1.0.15.119 Android (28/9; 420dpi; 1080x2260; OnePlus; GM1903; OnePlus7; qcom; sv_SE; 164094539)

/?__a=1 alternative endpoint;
BUT You should put user-agent for using this endpoint.
https://i.instagram.com/api/v1/users/web_profile_info/?username={username}

data.graphql.user = data.user
give same result

gimi · Answer 4 · 2022-06-02T10:09:22.433

0

ig modified the method, used the new method:

GET https://i.instagram.com/api/v1/tags/web_info/?tag_name=${tags}

POST https://i.instagram.com/api/v1/tags/${tags}/sections/
body: 
{
include_persistent: 0
max_id: ${The last request contained this field}
next_media_ids[]: ${The last request contained this field}
next_media_ids[]: ${The last request contained this field}
page: ${The last request contained this field}
surface: grid
tab: recent
}

edited Jun 02 '22 at 10:09

answered Jun 02 '22 at 08:53

gimi

1
1

how I can get the post (video or image) details using this method? – user3661581 Jun 02 '22 at 18:10

score 0 · Answer 5 · answered Jun 08 '22 at 20:40

I have your problem too. I am currently using an alternative solution to find a definitive solution. I have designed an offline api to convert links to media IDs. To use it, submit a request as follows: http://api-bot.ir/api/insta/media_id/?url=https://www.instagram.com/p/CP-Kws6FoRS/ Instead of this link, put any other link, you will receive a media ID. Of course, I emphasize that this is offline, and I can guide you to know the exact number of likes and other post information. So let me know if you need more help. Good luck.

Error when scraping Instagram media, by adding at the end of URL (?__a=1)

5 Answers5

Linked