0

I'm using the method the post link below to scraping instagram profiles. Can I change the number of images I retrieve? In the Json response I saw the 'has_next_page' parameter, but I'm not sure how to use it. Thanks in advance. Post link: What is the new instagram json endpoint?

Used code:

r = requests.get('https://www.instagram.com/' + profile + '/')
soup = BeautifulSoup(r.content)
scripts = soup.find_all('script', type="text/javascript", 
text=re.compile('window._sharedData'))
stringified_json = scripts[0].get_text().replace('window._sharedData = ', '')[:-1]
data = json.loads(stringified_json)['entry_data']['ProfilePage'][0]
Josef Ginerman
  • 1,460
  • 13
  • 24
Francesco
  • 11
  • 3

2 Answers2

0

You can find the Instagram API here: https://www.instagram.com/developer/ The documentatiopn is pretty neat I think, you just have to register to get an access token.

Kata
  • 142
  • 1
  • 7
0

Your problem is the following: In your code you scrape data from the profile page, which means you only get the images which have been loaded already. That's why you can't just set a larger number for it to get you more images.

I'd recommend one of the following:

1. Use Instagram's API, which comes with already built methods to do exactly what you seem to want to achieve (don't reinvent the wheel).

2. If instead you want to do most of the work yourself (let's say as an exercise) I'd recommend that you use Selenium, which is an automation. In your code you use BeautifulSoup which is great for retrieving data from HTML files, but you need to do something more: scroll - this is in order to allow for more pictures to be loaded. This way you can get as many pictures as you like.

In case you need an example, you can check out an example of something similar I wrote for Twitter here

Josef Ginerman
  • 1,460
  • 13
  • 24