-1

I am using instaloader to scrape instagram posts as part of a study project.

To avoid getting shut down by instagram, I use sleep function to sleep between 1-20 sec between each round. This works well.

I don't want to have to go through all posts each time I scrape, and therefore i want the loop to run 5 times. Which will give me 5 posts. But I don't seem to manage to get it to do it.

I had written the following function to try to scrape the profile and return the first 5 posts:

## importing and creating instance
from instaloader import Instaloader 
from instaloader import Profile
import instaloader 
import time 
from random import randint

L = instaloader.Instaloader()

#random time for sleep 
vent = randint(1,20)

# function:
def get2posts(profile_name):
    profile = Profile.from_username(L.context, profile_name)
    POSTS = profile.get_posts()

    for post in POSTS:
        for i in range(2):
            L.download_post(post, profile_name)
            time.sleep(vent)
        break

    print('scrape done')

This code returns 5 of the same posts though, and I simply can't figure out a way to get it to return the first 5 posts of an account.

The working function, which harvests all posts of a profile is:

# the original function (without range)
def get_posts(profile_name):
    profile = Profile.from_username(L.context, profile_name)
    POSTS = profile.get_posts()
    
    for post in POSTS: 
        L.download_post(post, profile_name)
        time.sleep(vent)
        print('I am done')

Hope you can help :)

Robert
  • 7,394
  • 40
  • 45
  • 64

1 Answers1

0

The problem is that the inner for loop runs download_post twice (range(2)) on the same post, and then the outer loop breaks. If POSTS is a list, you can use slicing to loop only over the first 5 items like so: for post in POSTS[:5]:. A safer method though would be to count the posts as you go, which should work for most types of iterables (not just lists), like so:

def get2posts(profile_name):
    profile = Profile.from_username(L.context, profile_name)
    POSTS = profile.get_posts()

    for i, post in enumerate(POSTS):
        L.download_post(post, profile_name)
        if i == 4:
            break
        time.sleep(vent)

    print('scrape done')
micromoses
  • 6,747
  • 2
  • 20
  • 29
  • When i == 4 it returns 5 posts though. Hence it returns i+1. I am not sure why this is the case? it is not a problem, I am just curious? :) – Emma Nitz Nov 17 '22 at 12:52
  • I wasn't sure, the question states five posts, which is why I found `range(2)` to be curious. In any case, the `i` generated by `enumerate` starts the count at 0 for the first item, 1 for the second and so forth. So when it is 4, processing of the 5th item is done. The check position is important; while it is more common to put such checks at the start (or end) of the iteration, this way runs through 5 posts and saves you the `sleep` after the last. Had it been swapped to be before `L.download_post`, it should have been `if i==5` to actually go through 5 items. – micromoses Nov 19 '22 at 16:05
  • Sorry for the late reply! Thank you so much for your help! It works perfectly now! – Emma Nitz Dec 15 '22 at 11:32