Currently I am scraping Instagram comments for a sentiment analysis project, and am using an Instagram scraper. It is supposed to output a comment file but it doesn't, so a workaround is to find the query URL in the log file and paste it into a browser.
An example URL would be this https://www.instagram.com/graphql/query/?query_hash=33ba35852cb50da46f5b5e889df7d159&variables={%22shortcode%22:%22CMex-IGn1G-%22,%22first%22:50,%22after%22:%22QVFCaERkTm84aWF3T1Exbmw5V0xhb05haVBEY2JaYmxhSTNGWVZ4M2RQWi0yVzVUSExlUlRYOUtsOVEtM0trRzBmSGxyYjdJV094a1hlYm1aLXZjdkVpZQ==%22}
.
On Firefox I am able to view the JSON response and am also able to download it through two ways:
CTRL
+A
to select all and paste into a JSON file.- Download webpage as a JSON file.
The issue with these methods are that neither of these retain the emoji data. The first loses the emojis as they are not stored in unicode, but rather as question marks ???
. I assumed this was related to the encoding, so tried to paste the raw response into Unicode files. Instead they are the emojis which can be represented as emojis ️, but not unicode.
The second method either saves it with only the message {"message":"rate limited","status":"fail"}
or another incorrect format.
The thing is, is that a few months ago I scraped some pages and managed to save the comments with the emojis stored in the unicode format. This is frustrating as I know it can be done, but I can't remember the process how I did it as I would have tried something basic, as I have outlined.
I am out of ideas and would greatly appreciate any help. Thank you.