0

I am writing a really basic web proxy in Python. In it's current state all I want it to do is take the request from the client and pass it straight through to the web, no caching or anything like that yet. My problem arises when I try to accept the response from the web server. I parse the client HTTP request in order to get the hostname of the web server, I go to the second line of the request and pull out the name from there. My program crashes with the error message [Errno -5] No address associated with hostname

I've pulled some similar code from Github and ran that code, they seem to parse the first line of the HTTP request and dig the hostname out of the url which, to me, seems like an over complication but their code relatively well. Am I taking in a newline character or something like that by taking from the second line?

import sys, socket

# Constants
PORT = 8080
MAX_BUFFER = 4096
HOST = 'localhost'

def main():
# Start socket
    try:
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.bind((HOST, PORT))
        s.listen(0)
        print ('Proxy running on %s:%d' % (HOST, PORT))

    except socket.error, (message):
        print(message)
        s.close()
        sys.exit(1)

# Listen for requests
    while 1:
    # conn is the socket we can send and receive to/from the client
        conn, client_addr = s.accept()
        print('Got call from client')

        request = conn.recv(MAX_BUFFER)
        print(str(request))

    # Parsing
        first_line = request.split('\n')[0]
        second_line = request.split('\n')[1]
        url = first_line.split(' ')[1]
        webserver = second_line.split(' ')[1]
        print(webserver)

    # Heard from client, now forward to server
        try:
            c = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            c.connect((webserver, 80))
            c.send(request)
            print('Sent request to webserver')

            while 1:
                data = c.recv(MAX_BUFFER)
                print('Received some data')

            # while there is data to receive from server
                if len(data) > 0:
                    conn.send(data)
                    print('Send data to client')

                else:
                    print('break')
                    break

            c.close()
            conn.close()

        except socket.error, (message):
            print(message)
            if c:
                c.close()

            if conn:
                conn.close()

            sys.exit(1)

    print('Goodbye')
#******************End of main*********************

if __name__ == '__main__':
    main()

Here is the extract of the code from Github that does work:

# Listen for requests
    while 1:
    # conn is the socket we can send and receive to/from the client
        conn, client_addr = s.accept()
        print('Got call from client')

        request = conn.recv(MAX_BUFFER)
        print(str(request))

    # Parsing
        first_line = request.split('\n')[0]
        url = first_line.split(' ')[1]
        http_pos = url.find('://')
        if http_pos == -1:
            temp = url
        else:
            temp = url[(http_pos + 3):]

        port_pos = temp.find(':')

        webserver_pos = temp.find('/')
        if webserver_pos == -1:
            webserver_pos = len(temp)
        webserver = ''
        port = -1

        if port_pos == -1 or webserver_pos < port_pos:
            port = 80
            webserver = temp[:webserver_pos]

        else:
            port = int((temp[(port_pos + 1):])[:webserver_pos - port_pos -1])
            webserver = temp[:port_pos]

        print(webserver)

    # Heard from client, now forward to server

Credit: https://github.com/luugiathuy/WebProxyPython/blob/master/proxy.py

Jack Cassidy
  • 159
  • 2
  • 14
  • try replacing localhost with 127.0.0.1 – Kaushal Kumar Singh Mar 11 '17 at 17:32
  • Nope, that didn't help @KaushalKumarSingh – Jack Cassidy Mar 11 '17 at 17:45
  • Can you please provide the complete error log? – Kaushal Kumar Singh Mar 11 '17 at 17:46
  • The only error I get is [Errno -5] No address associated with hostname @KaushalKumarSingh – Jack Cassidy Mar 11 '17 at 17:53
  • try replacing webserver value to webserver = second_line.split(' ')[1].split(":")[0] – Kaushal Kumar Singh Mar 11 '17 at 18:01
  • @KaushalKumarSingh That doesn't seem to work either, I'll show you the code I took from github that does work, I think it does the same thing as mine but obviously there is something different that I can't spot – Jack Cassidy Mar 11 '17 at 18:11
  • What I notice by running the code you provided, its not taking the hostname correctly. Say if you are opening any https site, you will see hostname:443 in you webserver variable which is wrong and will result into error. – Kaushal Kumar Singh Mar 11 '17 at 18:13
  • @KaushalKumarSingh Why is it not taking the hostname correctly? In th typical HTTP request message the first line is occupied by the method (GET/POST etc.) and the URI. The second line contains "Host: [host name]". So I do not see how removing the host name from the second line is failing – Jack Cassidy Mar 11 '17 at 18:26
  • @JackCassidy: This does not need to be the second line. Any line from the HTTP request header can contain the host header. Apart from that a HTTP proxy request contains the absolute URL (with hostname) in the first line. And there does not need to be a space in the host header at all. I recommend that you actually study how the protocol works instead of just guessing. – Steffen Ullrich Mar 11 '17 at 18:56
  • @SteffenUllrich OK, I take your point that the host header need not be the second line, but I have run a request through where the host header is definitely the second line, I printed it out to the terminal! I then print out the string that I have saved in the webserver variable and it is indeed the hos name. I guess I will just use the brute force parsing of the URL for now – Jack Cassidy Mar 12 '17 at 20:21

0 Answers0