0

I'm trying to get the destination of a bunch of t.co links from Twitter. I can get this for active links, but when they are 404 or dead links, the program dies. If I enter this into the browser, it shows me the destination URL.

Is there a way to do this in Python 3?

This is my existing code:

import requests
import pandas as pd
from requests.models import Response

# Loading my array of links
data = pd.read_json('tco-links.json')

links = pd.DataFrame(data)

output = []

session = requests.Session()  # so connections are recycled

with open('output.json', 'w') as f:

    for index, row in links.iterrows():
        fullLink = 'http://' + row['link']

        try:
            response = session.head(fullLink, allow_redirects=True)
        except:
            # how I'm handling errors right now
            response = Response()
            response.url = 'Failed'

        output.append({
            'link': fullLink,
            'id': row['id'],
            'unshortened': response.url
        })

        for x in output:
            f.write(json.dumps(x) + '\n')

podcastfan88
  • 970
  • 2
  • 13
  • 28

0 Answers0