0

I'm extracting URLs from tweets using Twitter APIs with a Python lib named Twython. I use home_timeline API and look at entities to search for URLs. Normally, the links in entities are correct. However, in some cases, the links are wrong. For example, this is a tweet from account @WWF-Philippines: enter image description here

When I hover the cursor over the highlighted link, it shows shortened_url1 (I cannot put here because Stackoverflow doesn't allow) on the status bar. If I click on the link, it opens an external article. However, if I use Twitter API to query the corresponding tweet, here is the tweet I get:

Need a guide to properly enjoy the great outdoors while minimizing human impact? This list is for you!\xe2\x80\xa6 shortened_url2

You can see the shortened_url2 here is different from the true link (shortened_url1) shown when hovering the cursor. If I follow the shortened_url2, it opens the same tweet. The link in the entities part is the same with this wrong link (shortened_url2).

So what's wrong with Twitter APIs here? Thanks.

lenhhoxung
  • 2,530
  • 2
  • 30
  • 61

1 Answers1

2

I think you're looking at the old version of the entities.

The Twitter status is - https://twitter.com/WWF_Philippines/status/869027117652033536

Calling https://api.twitter.com/1.1/statuses/show/869027117652033536.json from the API gives us the following entities:

"truncated": true,
"entities": {
    "hashtags": [],
    "symbols": [],
    "user_mentions": [],
    "urls": [{
        "url": "https:\/\/t.co\/UatUzmm9re",
        "expanded_url": "https:\/\/twitter.com\/i\/web\/status\/869027117652033536",
        "display_url": "twitter.com\/i\/web\/status\/8\u2026",
        "indices": [104, 127]
    }]
},

Notice at the top it says "truncated": true,?

Recently Twitter changed how it displays tweets and how they're represented in the API - see https://dev.twitter.com/overview/api/upcoming-changes-to-tweets

You need to add ?tweet_mode=extended to the end of your query. That will get you back:

"truncated": false,
"display_text_range": [0, 126],
"entities": {
    "hashtags": [],
    "symbols": [],
    "user_mentions": [],
    "urls": [{
        "url": "https:\/\/t.co\/BgKxmFzrQc",
        "expanded_url": "http:\/\/bit.ly\/7LNTPrinciples",
        "display_url": "bit.ly\/7LNTPrinciples",
        "indices": [103, 126]
    }],

Which contains the data you want.

Terence Eden
  • 14,034
  • 3
  • 48
  • 89
  • Thanks, it might be the cause. However, Twython doesn't support the parameter 'tweet_mode' yet. – lenhhoxung Jun 03 '17 at 21:24
  • It does. See https://github.com/ryanmcgrath/twython/issues/430 - you need to add something like `tweet = twitter.show_status(id=tweet_id, tweet_mode='extended')` – Terence Eden Jun 04 '17 at 05:35