11

I'm trying to download photos posted with specific tag in real time. I found real time api pretty useless so I'm using long polling strategy. Below is pseudocode with comments of sublte bugs in it

newMediaCount = getMediaCount();
delta = newMediaCount - mediaCount;
if (delta > 0) {
    // if mediaCount changed by now, realDelta > delta, so realDelta - delta photos won't be grabbed and on next poll if mediaCount didn't change again realDelta - delta would be duplicated else ...
    // if photo posted from private account last photo will be duplicated as counter changes but nothing is added to recent
    recentMedia = getRecentMedia(delta);
    // persist recentMedia
    mediaCount = newMediaCount;
}

Second issue can be addressed with Set of some sort I gueess. But first really bothers me. I've moved two calls to instagram api as close as possible but is this enough?

Edit

As Amir suggested I've rewritten the code with use of min/max_tag_ids. But it still skips photos. I couldn't find better way to test this than save images on disk for some time and compare result to instagram.com/explore/tags/.

public class LousyInstagramApiTest {

   @Test
    public void testFeedContinuity() throws Exception {
        Instagram instagram = new Instagram(Settings.getClientId());
        final String TAG_NAME = "portrait";
        String id = instagram.getRecentMediaTags(TAG_NAME).getPagination().getMinTagId();
        HashtagEndpoint endpoint = new HashtagEndpoint(instagram, TAG_NAME, id);

        for (int i = 0; i < 10; i++) {
            Thread.sleep(3000);
            endpoint.recentFeed().forEach(d -> {
                try {
                    URL url = new URL(d.getImages().getLowResolution().getImageUrl());
                    BufferedImage img = ImageIO.read(url);
                    ImageIO.write(img, "png", new File("D:\\tmp\\" + d.getId() + ".png"));
                } catch (Exception e) {
                    e.printStackTrace();
                }
            });
        }
    }
}

class HashtagEndpoint {
    private final Instagram instagram;
    private final String hashtag;
    private String minTagId;

    public HashtagEndpoint(Instagram instagram, String hashtag, String minTagId) {
        this.instagram = instagram;
        this.hashtag = hashtag;
        this.minTagId = minTagId;
    }

    public List<MediaFeedData> recentFeed() throws InstagramException {
        TagMediaFeed feed = instagram.getRecentMediaTags(hashtag, minTagId, null);
        List<MediaFeedData> dataList = feed.getData();
        if (dataList.size() == 0) return Collections.emptyList();

        String maxTagId = feed.getPagination().getNextMaxTagId();
        if (maxTagId != null && maxTagId.compareTo(minTagId) > 0) dataList.addAll(paginateFeed(maxTagId));
        Collections.reverse(dataList);
//        dataList.removeIf(d -> d.getId().compareTo(minTagId) < 0);

        minTagId = feed.getPagination().getMinTagId();
        return dataList;
    }

    private Collection<? extends MediaFeedData> paginateFeed(String maxTagId) throws InstagramException {
        System.out.println("pagination required");

        List<MediaFeedData> dataList = new ArrayList<>();
        do {
            TagMediaFeed feed = instagram.getRecentMediaTags(hashtag, null, maxTagId);
            maxTagId = feed.getPagination().getNextMaxTagId();
            dataList.addAll(feed.getData());
        } while (maxTagId.compareTo(minTagId) > 0);
        return dataList;
    }

}
user2418306
  • 2,352
  • 1
  • 22
  • 33
  • Why did you find the Realtime API useless? I'm using it right now and it works well. – Gonzalingui May 11 '15 at 21:00
  • @Gonzalingui because it doesn't send you the data itself. To get the data you need to use one of the strategies above both of wich doesn't work. And you can't use it off-serverside which I'm on. – user2418306 May 11 '15 at 22:40

1 Answers1

4

Using the Tag endpoints to get the recent media with a desired tag, it returns a min_tag_id in its pagination info, which is tied to the most recently tagged media at the time of your call. As the API also accepts a min_tag_id parameter, you can pass that number from your last query to only receive those media that are tagged after your last query.

So based on whatever polling mechanism you have, you just call the API to get the new recent media if any based on last received min_tag_id.

You will also need to pass a large count parameter and follow the pagination of the response to receive all data without losing anything when the speed of tagging is faster than your polling.

Update:
Based on your updated code:

public List<MediaFeedData> recentFeed() throws InstagramException {
    TagMediaFeed feed = instagram.getRecentMediaTags(hashtag, minTagId, null, 100000);
    List<MediaFeedData> dataList = feed.getData();
    if (dataList.size() == 0) return Collections.emptyList();

    // follow the pagination
    MediaFeed recentMediaNextPage = instagram.getRecentMediaNextPage(feed.getPagination());
    while (recentMediaNextPage.getPagination() != null) {
        dataList.addAll(recentMediaNextPage.getData());
        recentMediaNextPage = instagram.getRecentMediaNextPage(recentMediaNextPage.getPagination());
    }

    Collections.reverse(dataList);

    minTagId = feed.getPagination().getMinTagId();
    return dataList;
}
user2418306
  • 2,352
  • 1
  • 22
  • 33
Amir Rahimi Farahani
  • 1,580
  • 1
  • 12
  • 14
  • Thank you for the reply, sorry for late response. I was testing this approach and run into strange results which turned out to be not `min_tag_id` specific as I thought at first. E.g. this call https://api.instagram.com/v1/tags/partytools/media/recent?min_id=964461544535307126_444055843 does not return media with specified id, in fact it returns only 6 last posts. And if you ommit `min_id` and give counter or provide both it gives only 6 last entries. Here is my tests using jInstagram: http://pastebin.com/EnPRwMLw. Shortcode used there is from 9th photo. – user2418306 May 02 '15 at 13:09
  • This approach resulted in skipping some of the photos. – user2418306 May 06 '15 at 17:28
  • You are making things a bit complicated by implementing the pagination yourself. Just make one call with `min_tag_id` and follow the built-in pagination. Please see my updated answer. – Amir Rahimi Farahani May 10 '15 at 08:36
  • Yeah. Although built-in pagination not intended to be used in that way as it's never `null` and when you call `getRecentMediaNextPage` for the first time it will cause an exception as `nextUrl` is null. However problem is not in pagination. You can substitute all pagination code for `if (feed.getPagination().getNextUrl() != null) System.out.println("pagination required");`, run the harness, pagination won't be required a single time but the photos still will be missing... – user2418306 May 10 '15 at 15:46
  • If someone edit their already posted media and add a tag, the newly tagged media would not appear as very recent rather it would get placed in tag-recent-feed at the time the media was originally posted. – Amir Rahimi Farahani May 13 '15 at 12:24
  • 1
    I find it hard to believe that every 10th person edits photo in 1-3 seconds after posting, adding in average 30 words to the caption. – user2418306 May 13 '15 at 23:03