Check the server response code, then export to csv

Question

I'm using the below code to check server response codes. Instead of manually entering the URLs, I'd like python to check a CSV (data.csv) and then export the results to a new CSV (new_data.csv). Does anyone know how to write this?

Thanks for your time!

import urllib2
for url in ["http://stackoverflow.com/", "http://stackoverflow.com/questions/"]:
try:
    connection = urllib2.urlopen(url)
    print connection.getcode()
    connection.close()
except urllib2.HTTPError, e:
    print e.getcode()

# Prints:
#200 or 404

UPDATE:

import csv

out=open("urls.csv","rb")
data=csv.reader(out)
data=[row for row in data]
out.close()

print data

import urllib2
for url in ["http://stackoverflow.com/", "http://stackoverflow.com/questions/"]:
try:
    connection = urllib2.urlopen(url)
    print connection.getcode()
    connection.close()
except urllib2.HTTPError, e:
    print e.getcode()

OUTPUT:

[['link'], ['link'], ['link'], ['link'], ['link'], ['link']]

200

200

UPDATE:

import csv

with open("urls.csv", 'r') as csvfile:
    urls = [row[0] for row in csv.reader(csvfile)]

import urllib2
for url in urls:
    try:
        connection = urllib2.urlopen(url)
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

Thanks, mate. I'm pretty new to the CSV module. I've been able to call the URLs from CSV, however, how do I now ask Python to check the URLs from CSV? Here's what I now have: `import csv out=open("urls.csv","rb") data=csv.reader(out) data=[row for row in data] out.close() print data import urllib2 for url in ["http://stackoverflow.com/", "http://stackoverflow.com/questions/"]: try: connection = urllib2.urlopen(url) print connection.getcode() connection.close() except urllib2.HTTPError, e: print e.getcode()` — Sam Perry, Jul 14 '13 at 06:44
You'd better put the code in your question(edit it) so it is clear and more people are possible to see it. @Sam Perry — zhangyangyu, Jul 14 '13 at 06:46
You have already got that. Why don't you merge the code. The `data` is already a list containing the urls. By the way, open the csv file in r mode, not rb mode. — zhangyangyu, Jul 14 '13 at 06:55
Thanks for your help, @zhangyangyu. I've changed to open with "r". Sorry for my beginner knowledge but i'm struggling to merge the two codes. I thought it would work if I changed to `for url in ["data"]: @zhangyangyu, would you mind providing the code in an answer? Thanks again. — Sam Perry, Jul 14 '13 at 07:07

score 0 · Accepted Answer · answered Jul 14 '13 at 07:23

0

I think you have your clue from your print data output: [['link'], ['link'], ['link'], ['link'], ['link'], ['link']] - This tells me that you are probably making a mistake with the line data=[row for row in data] as it is giving you a list of lists this is why you can not simply use for url in data:.

BTW you will find the whole thing less confusing if you put some thought into naming - e.g. input from a file handle called 'out' and data = something based on data...

answered Jul 14 '13 at 07:23

Steve Barnes

27,618
6
63
73

I've updated the example above. Thanks for your help! I've changed `data=[row for row in data]` and `for url in data:`. However, I now get a large number of errors if Python comes across a 404 responses error: `ile "\script.py" connection = urllib2.urlopen(url)` – Sam Perry Jul 14 '13 at 09:32
If you have a list of URLs that do not point to pages they you will get errors printed for each, if you don't wish to see them then change: *print e.getcode()* to *pass* – Steve Barnes Jul 15 '13 at 07:10

Check the server response code, then export to csv

1 Answers1