0

I'm using the below code to check server response codes. Instead of manually entering the URLs, I'd like python to check a CSV (data.csv) and then export the results to a new CSV (new_data.csv). Does anyone know how to write this?

Thanks for your time!

import urllib2
for url in ["http://stackoverflow.com/", "http://stackoverflow.com/questions/"]:
try:
    connection = urllib2.urlopen(url)
    print connection.getcode()
    connection.close()
except urllib2.HTTPError, e:
    print e.getcode()

# Prints:
#200 or 404

UPDATE:

import csv

out=open("urls.csv","rb")
data=csv.reader(out)
data=[row for row in data]
out.close()

print data

import urllib2
for url in ["http://stackoverflow.com/", "http://stackoverflow.com/questions/"]:
try:
    connection = urllib2.urlopen(url)
    print connection.getcode()
    connection.close()
except urllib2.HTTPError, e:
    print e.getcode()

OUTPUT:

[['link'], ['link'], ['link'], ['link'], ['link'], ['link']]

200

200

UPDATE:

import csv

with open("urls.csv", 'r') as csvfile:
    urls = [row[0] for row in csv.reader(csvfile)]

import urllib2
for url in urls:
    try:
        connection = urllib2.urlopen(url)
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()
Greenonline
  • 1,330
  • 8
  • 23
  • 31
Sam Perry
  • 2,554
  • 3
  • 28
  • 29
  • 1
    Try the `csv` module or just use a plain text. – zhangyangyu Jul 14 '13 at 06:23
  • Thanks, mate. I'm pretty new to the CSV module. I've been able to call the URLs from CSV, however, how do I now ask Python to check the URLs from CSV? Here's what I now have: `import csv out=open("urls.csv","rb") data=csv.reader(out) data=[row for row in data] out.close() print data import urllib2 for url in ["http://stackoverflow.com/", "http://stackoverflow.com/questions/"]: try: connection = urllib2.urlopen(url) print connection.getcode() connection.close() except urllib2.HTTPError, e: print e.getcode()` – Sam Perry Jul 14 '13 at 06:44
  • You'd better put the code in your question(edit it) so it is clear and more people are possible to see it. @Sam Perry – zhangyangyu Jul 14 '13 at 06:46
  • You have already got that. Why don't you merge the code. The `data` is already a list containing the urls. By the way, open the csv file in r mode, not rb mode. – zhangyangyu Jul 14 '13 at 06:55
  • Thanks for your help, @zhangyangyu. I've changed to open with "r". Sorry for my beginner knowledge but i'm struggling to merge the two codes. I thought it would work if I changed to `for url in ["data"]: @zhangyangyu, would you mind providing the code in an answer? Thanks again. – Sam Perry Jul 14 '13 at 07:07

1 Answers1

0

I think you have your clue from your print data output: [['link'], ['link'], ['link'], ['link'], ['link'], ['link']] - This tells me that you are probably making a mistake with the line data=[row for row in data] as it is giving you a list of lists this is why you can not simply use for url in data:.

BTW you will find the whole thing less confusing if you put some thought into naming - e.g. input from a file handle called 'out' and data = something based on data...

Steve Barnes
  • 27,618
  • 6
  • 63
  • 73
  • I've updated the example above. Thanks for your help! I've changed `data=[row for row in data]` and `for url in data:`. However, I now get a large number of errors if Python comes across a 404 responses error: `ile "\script.py" connection = urllib2.urlopen(url)` – Sam Perry Jul 14 '13 at 09:32
  • If you have a list of URLs that do not point to pages they you will get errors printed for each, if you don't wish to see them then change: *print e.getcode()* to *pass* – Steve Barnes Jul 15 '13 at 07:10