0

I finished editing a script that check the url is requiring a WWW web basic authentication or not and printing the result for the user as in this script :

#!/usr/bin/python

# Importing libraries
from urllib2 import urlopen, HTTPError
import socket
import urllib2
import threading
import time

# Setting up variables
url = open("oo.txt",'r')
response = None
start = time.time()

# Excuting Coommands
start = time.time()
for line in url:
    try:
        response = urlopen(line, timeout=1)
    except HTTPError as exc:
        # A 401 unauthorized will raise an exception
        response = exc
    except socket.timeout:
        print ("{0} | Request timed out !!".format(line))
    except urllib2.URLError:
        print ("{0} | Access error !!".format(line))

    auth = response and response.info().getheader('WWW-Authenticate')
    if auth and auth.lower().startswith('basic'):
        print "requires basic authentication"
    elif socket.timeout or urllib2.URLError:
        print "Yay"
    else:
        print "Not requires basic authentication"

print "Elapsed Time: %s" % (time.time() - start)

I have a little things i need your help with the script to edit it here .. I want the script to check every 10 urls together and give the result for all the urls in one time inside a text file . I read about the multithreading and the processing but i didn't find a match form my case to simplify the code to me .

also i have a problem in the result when a timeout or a url error appears , the script give the result in 2 lines like that :

http://www.test.test
 | Access error !!

I want it in one line , why it shows in tow ??

Any help in this issues ?

Thanks in advance

warvariuc
  • 57,116
  • 41
  • 173
  • 227
abualameer94
  • 91
  • 3
  • 13
  • related: [Brute force basic http authorization using httplib and multiprocessing](https://gist.github.com/zed/0a8860f4f9a824561b51) (It could be easily simplified for your case) – jfs Mar 12 '14 at 10:29

1 Answers1

1

The package concurrent.futures provides functionality, that makes it very easy to use concurrency in Python. You define a function check_url that should be called for each URL. Then you can use the map function the apply the function to each URL in parallel and iterate over the return values.

#! /usr/bin/env python3

import concurrent.futures
import urllib.error
import urllib.request
import socket

def load_urls(pathname):
    with open(pathname, 'r') as f:
        return [ line.rstrip('\n') for line in f ]

class BasicAuth(Exception): pass

class CheckBasicAuthHandler(urllib.request.BaseHandler):
    def http_error_401(self, req, fp, code, msg, hdrs):
        if hdrs.get('WWW-Authenticate', '').lower().startswith('basic'):
            raise BasicAuth()
        return None

def check_url(url):
    try:
        opener = urllib.request.build_opener(CheckBasicAuthHandler())
        with opener.open(url, timeout=1) as u:
            return 'requires no authentication'
    except BasicAuth:
        return 'requires basic authentication'
    except socket.timeout:
        return 'request timed out'
    except urllib.error.URLError as e:
        return 'access error ({!r})'.format(e.reason)

if __name__ == '__main__':
    urls = load_urls('/tmp/urls.txt')
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        for url, result in zip(urls, executor.map(check_url, urls)):
            print('{}: {}'.format(url, result))
nosid
  • 48,932
  • 13
  • 112
  • 139
  • Thanks , but for sorry i don't understand in the "def" function so what code i am supposed to put in the this def ? – abualameer94 Mar 08 '14 at 11:45
  • also what is the solution for the line issue i have mentioned ? – abualameer94 Mar 08 '14 at 11:47
  • I have made to effort to write down the whole program. However, it's based on _Python 3_. – nosid Mar 08 '14 at 15:05
  • Hi, Thanks for your code , it is working but with one issue , I tried the code , but when the page requires auth it retrun the value "access error" as the auth error is included in the urllib.error.URLError , could you tell how to solve this ? – abualameer94 Mar 09 '14 at 17:58
  • Checking for _basic auth_ is a bit more complicated, because _urllib_ is a higher-level api, that is intended to handle the status codes internally. I have updated the answer for _urllib_, but maybe it would be better to use _http.client_ instead. – nosid Mar 09 '14 at 19:33