1

I'm writing my own directory buster in python, and I'm testing it against a web server of mine in a safe and secure environment. This script basically tries to retrieve common directories from a given website and, looking at the HTTP status code of the response, it is able to determine if a page is accessible or not.
As a start, the script reads a file containing all the interesting directories to be looked up, and then requests are made, in the following way:

for dir in fileinput.input('utils/Directories_Common.wordlist'):

    try:
        conn = httplib.HTTPConnection(url)
        conn.request("GET", "/"+str(dir))
        toturl = 'http://'+url+'/'+str(dir)[:-1]
        print '    Trying to get: '+toturl
        r1 = conn.getresponse()
        response = r1.read()
        print '   ',r1.status, r1.reason
        conn.close()

Then, the response is parsed and if a status code equal to "200" is returned, then the page is accessible. I've implemented all this in the following way:

if(r1.status == 200):
    print '\n[!] Got it! The subdirectory '+str(dir)+' could be interesting..\n\n\n'

All seems fine to me except that the script marks as accessible pages that actually aren't. In fact, the algorithm collects the only pages that return a "200 OK", but when I manually surf to check those pages I found out they have been moved permanently or they have a restricted access. Something went wrong but I cannot spot where should I fix the code exactly, any help is appreciated..

user1405417
  • 27
  • 1
  • 1
  • 8

2 Answers2

2

I did not found any problems with your code, except it is almost unreadable. I have rewritten it into this working snippet:

import httplib

host = 'www.google.com'
directories = ['aosicdjqwe0cd9qwe0d9q2we', 'reader', 'news']

for directory in directories:
    conn = httplib.HTTPConnection(host)
    conn.request('HEAD', '/' + directory)

    url = 'http://{0}/{1}'.format(host, directory)
    print '    Trying: {0}'.format(url)

    response = conn.getresponse()
    print '    Got: ', response.status, response.reason

    conn.close()

    if response.status == 200:
        print ("[!] The subdirectory '{0}' "
               "could be interesting.").format(directory)

Outputs:

$ python snippet.py
    Trying: http://www.google.com/aosicdjqwe0cd9qwe0d9q2we
    Got:  404 Not Found
    Trying: http://www.google.com/reader
    Got:  302 Moved Temporarily
    Trying: http://www.google.com/news
    Got:  200 OK
[!] The subdirectory 'news' could be interesting.

Also, I did use HEAD HTTP request instead of GET, as it is more efficient if you do not need the contents and you are interested only in the status code.

Honza Javorek
  • 8,566
  • 8
  • 47
  • 66
  • Thanks a lot, I'll make it more readable then and I'll try to solve the issue looking at your implementation. – user1405417 Apr 12 '13 at 10:06
  • I'm trying to use the example you just showed me, when I make requests to Google I get always: 400 bad request while using your code I get the status code you've written in this post..dunno what is wrong..maybe I'm missing something somewhere..if you want to take a look at the core part of the code is here: [link](http://pastebin.com/VaAb18uX) – user1405417 Apr 12 '13 at 10:40
  • I took your script and start playing around with it, I've changed the way how the directories are retrieved and I used: "for directory in fileinput.input('utils/Directories_Common.wordlist'):" since I've all the directories listed in that file..with this modification I always get a 400 bad request... – user1405417 Apr 12 '13 at 10:49
  • I can give no further advice unless I know exactly what it in the 'utils/Directories_Common.wordlist' file and what output do you get. Are you sure your URLs return different codes? Try to test it using this: http://stackoverflow.com/a/6136861/325365 – Honza Javorek Apr 12 '13 at 10:58
  • That file contains a list of all the directories to look up, one entry for each line.. – user1405417 Apr 12 '13 at 11:08
  • I've found out that if I use a vector, containing all the directories, it works better, but I can't insert almost 1 hundred of directories in a vector, I'd like to use a separate file.. – user1405417 Apr 12 '13 at 11:14
  • I've fix all by scanning the directories file and populating an array with a directory for each element of the vector. It was the way in which I retrieved the directories that caused problems.. – user1405417 Apr 12 '13 at 11:28
  • Despite this, when I use HEAD method instead of GET, I always get 400 bad request..I fixed even this one, when importing the directories there was still present the new line char and this caused some issues.. – user1405417 Apr 12 '13 at 11:47
  • Maybe it is not supported well by your server. Feel free to use GET instead. – Honza Javorek Apr 12 '13 at 12:02
1

I would be adviced you to use http://docs.python-requests.org/en/latest/# for http.

Shooe
  • 75
  • 2