1

I'm new to python and currently trying to achieve the following:

I want to check HTTP response status codes for multiple URLs in my input.csv file:

id    url
1    https://www.google.com
2    https://www.example.com
3    https://www.testtesttest.com
...

and save results as an additional column 'status' flagging those URLs that are down or with some other issues in my output.csv file:

id    url                            status
1    https://www.google.com          All good!
2    https://www.example.com         All good!
3    https://www.testt75esttest.com    Down
...

so far I was trying the following, but unsuccessfully::

import requests
import pandas as pd
import requests.exceptions

df = pd.read_csv('path/to/my/input.csv')

urls = df.T.values.tolist()[1]


try:
    r = requests.get(urls)
    r.raise_for_status()  
except (requests.exceptions.ConnectionError, requests.exceptions.Timeout):
    print "Down"
except requests.exceptions.HTTPError:
    print "4xx, 5xx"
else:
    print "All good!"

not sure how I could get results for the above and save as a new column in the output.csv file:

df['status'] = #here the result 
df.to_csv('path/to/my/output.csv', index=False)

Would someone be able to help with this? Thanks in advance!

Baobab1988
  • 685
  • 13
  • 33

1 Answers1

2
id  url
1   https://www.google.com
2   https://www.example.com
3   https://www.testtesttest.com

Copy the above to clipboard. Then, run the below code. You need to loop through the urls and append the status to a list. Then, set the list as a new column.

import requests
import pandas as pd
import requests.exceptions
df = pd.read_clipboard()
df

urls = df['url'].tolist()
status = []
for url in urls:
    try:
        r = requests.get(url)
        r.raise_for_status()
    except (requests.exceptions.ConnectionError, requests.exceptions.Timeout):
        status.append("Down")
    except requests.exceptions.HTTPError:
        status.append("4xx, 5xx")
    else:
        status.append("All good!")
df['status'] = status
df.to_csv('path/to/my/output.csv', index=False)
David Erickson
  • 16,433
  • 2
  • 19
  • 35
  • thanks David! I've tried this and it worked. However, for some URLs it throws this error `requests.exceptions.TooManyRedirects: Exceeded 30 redirects.` would you know how I can resolve this? – Baobab1988 Mar 21 '20 at 19:35
  • @Baobab1988 hopefully this link is informative: https://stackoverflow.com/questions/23651947/python-requests-requests-exceptions-toomanyredirects-exceeded-30-redirects O therwise, I would just google the error and look through stackoverflow posts until you find a case that might be similar to yours. It's hard to know without the links and being able to troubleshoot the specific problem for some of the links. – David Erickson Mar 21 '20 at 19:38