0

Currently I am scraping Instagram comments for a sentiment analysis project, and am using an Instagram scraper. It is supposed to output a comment file but it doesn't, so a workaround is to find the query URL in the log file and paste it into a browser.

An example URL would be this https://www.instagram.com/graphql/query/?query_hash=33ba35852cb50da46f5b5e889df7d159&variables={%22shortcode%22:%22CMex-IGn1G-%22,%22first%22:50,%22after%22:%22QVFCaERkTm84aWF3T1Exbmw5V0xhb05haVBEY2JaYmxhSTNGWVZ4M2RQWi0yVzVUSExlUlRYOUtsOVEtM0trRzBmSGxyYjdJV094a1hlYm1aLXZjdkVpZQ==%22}.

On Firefox I am able to view the JSON response and am also able to download it through two ways:

  1. CTRL + A to select all and paste into a JSON file.
  2. Download webpage as a JSON file.

The issue with these methods are that neither of these retain the emoji data. The first loses the emojis as they are not stored in unicode, but rather as question marks ???. I assumed this was related to the encoding, so tried to paste the raw response into Unicode files. Instead they are the emojis which can be represented as emojis ️, but not unicode.

The second method either saves it with only the message {"message":"rate limited","status":"fail"} or another incorrect format.

The thing is, is that a few months ago I scraped some pages and managed to save the comments with the emojis stored in the unicode format. This is frustrating as I know it can be done, but I can't remember the process how I did it as I would have tried something basic, as I have outlined.

I am out of ideas and would greatly appreciate any help. Thank you.

  • "*The first loses the emojis as they are not stored in unicode, but rather as question marks `???`.*" - sounds like maybe you saved the file in an ANSI format instead of as UTF-8, or used a text editor that doesn't even support Unicode at all. – Remy Lebeau Mar 30 '21 at 01:10
  • @RemyLebeau I used both Notepad and Notepad++, which both correctly display the unicode values as \u2000 for example which is strange, would you recommend a different text editor? – Ronnie Lightweightbaby Coleman Mar 30 '21 at 08:00
  • again, which format did you save the file as exactly? Make sure you choose UTF-8 specifically, not any kind of default encoding. The save dialog should let you pick the format. – Remy Lebeau Mar 30 '21 at 14:20
  • @RemyLebeau thanks for the response. I tried again but they're still being stored like this https://imgur.com/8KBqrVD . A bit of further info, the ones that do work are JSON files, but I have tried all of the above with both txt and JSON. – Ronnie Lightweightbaby Coleman Mar 30 '21 at 14:54

0 Answers0