2

I write a simple program to get some information from a website using python. but when I run the code below, it always returns the following 301 info. At the same time, my browser can visit the website easily. Please tell me why this happens and how to improve my code to avoid the problem.

HTTP/1.1 301 Moved Permanently
Date: Tue, 28 Aug 2018 14:26:20 GMT
Server: Apache
Referrer-Policy: origin-when-cross-origin
Location: https://www.ncbi.nlm.nih.gov/
Content-Length: 237
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a         href="https://www.ncbi.nlm.nih.gov/">here</a>.</p>
</body></html>

import socket

searcher = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
searcher.connect(("www.ncbi.nlm.nih.gov", 80))
cmd = "GET https://www.ncbi.nlm.nih.gov/ HTTP/1.0\r\n\r\n".encode()
searcher.send(cmd)
while True:
    data = searcher.recv(512)
    if len(data)<1: break
    print(data.decode())
searcher.close()

1 Answers1

1

You recieve a 301 because site is redirecting to https site.

I don't know if using sockets is mandatory, but if not you can use requests, it's a easy-to-use lib for doing http requests:

import requests

req = requests.get("http://www.ncbi.nlm.nih.gov")
html = req.text

With this, the 301 is performed anyway but it's transparent.

If you want to do it with sockets, you should add the "ssl layer" manually:

import socket
import ssl

searcher = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
searcher.connect(("www.ncbi.nlm.nih.gov", 443))
searcher = ssl.wrap_socket(searcher, keyfile=None, certfile=None, server_side=False, cert_reqs=ssl.CERT_NONE, ssl_version=ssl.PROTOCOL_SSLv23)
cmd = "GET https://www.ncbi.nlm.nih.gov/ HTTP/1.0\r\n\r\n".encode()
searcher.send(cmd)
while True:
    data = searcher.recv(512)
    if len(data) < 1: break
    print(data.decode())
searcher.close()
Roomm
  • 905
  • 11
  • 23