I'm trying to get the destination of a bunch of t.co links from Twitter. I can get this for active links, but when they are 404 or dead links, the program dies. If I enter this into the browser, it shows me the destination URL.
Is there a way to do this in Python 3?
This is my existing code:
import requests
import pandas as pd
from requests.models import Response
# Loading my array of links
data = pd.read_json('tco-links.json')
links = pd.DataFrame(data)
output = []
session = requests.Session() # so connections are recycled
with open('output.json', 'w') as f:
for index, row in links.iterrows():
fullLink = 'http://' + row['link']
try:
response = session.head(fullLink, allow_redirects=True)
except:
# how I'm handling errors right now
response = Response()
response.url = 'Failed'
output.append({
'link': fullLink,
'id': row['id'],
'unshortened': response.url
})
for x in output:
f.write(json.dumps(x) + '\n')