0

I want to scrap Images from midjourney.com I had a perfectly working script that can do this but now my requests get blocked. I get a 403 ( Forbidden ) as response. To validate my code I converted the coped the request to the main page out off my Browser and converted it in to a script that dose the same request

My Guess is that this is a new Content Security Policy too prevent loading the site outside off a Browser. Has anyone a idea to get around this?

I would really appreciate some hints or ideas. Btw this is the test script:

import requests

cookies = {
    'imageSize': 'medium',
    'imageLayout_2': 'hover',
    'getImageAspect': '2',
    'fullWidth': 'false',
    'showHoverIcons': 'true',
    '_dd_s': 'rum=0&expire=1687926962555',
    '__Host-next-auth.csrf-token': 'c19e3aa92427d9ade40721425bf5affb4955e52f766d0c2a8ca064d3ffef6d9c^%^7Cda294da6acc19312b0399c98e0148de068f41999f76145e15717bdcd3ee9f5c9',
    '__Secure-next-auth.callback-url': 'https^%^3A^%^2F^%^2Fwww.midjourney.com',
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    # 'Accept-Encoding': 'gzip, deflate, br',
    'Alt-Used': 'www.midjourney.com',
    'Connection': 'keep-alive',
    # 'Cookie': 'imageSize=medium; imageLayout_2=hover; getImageAspect=2; fullWidth=false; showHoverIcons=true; _dd_s=rum=0&expire=1687926962555; __Host-next-auth.csrf-token=c19e3aa92427d9ade40721425bf5affb4955e52f766d0c2a8ca064d3ffef6d9c^%^7Cda294da6acc19312b0399c98e0148de068f41999f76145e15717bdcd3ee9f5c9; __Secure-next-auth.callback-url=https^%^3A^%^2F^%^2Fwww.midjourney.com',
    'Upgrade-Insecure-Requests': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    # Requests doesn't support trailers
    # 'TE': 'trailers',
}

response = requests.get('https://www.midjourney.com/home/?callbackUrl=^%^2Fapp^%^2F', cookies=cookies, headers=headers)

print(response.status_code)
print(response.text)

Load website with CSP

  • It's not likely to be caused by CSP as this is something implemented by the browser (just like CORS). Given that the returned status is a 403 (Forbidden) the chances are that there's a cookie or something that you're missing that is used to authenticate requests. – phuzi Aug 31 '23 at 12:46

1 Answers1

0

CSP is protection mechanism that only works inside of browsers, not in python scripts outside of the browser. It's the browser's role to read the headers of the incoming request and prevent the content from running/being shown to the user. If you run your script outside of the browser, the reason behind 403 response is likely not CSP.

I would advise to get a fresh values for cookies variable. Possibly some of the tokens in the cookies variable have expired and hence the 403.

Maciej Dobosz
  • 148
  • 1
  • 12