While performing a simple task of ip-address extraction, I found that the program is doing well. But in the complete program for web crawling it fail to survive and gives uneven results.
This is my code snippet for ip-address:
#!/usr/bin/python3
import os
import re
def get_ip_address(url):
command = "host " + url
process = os.popen(command)
results = str(process.read())
marker = results.find("has address") + 12
n = (results[marker:].splitlines()[0])
m = re.search('\w+ \w+: \d\([A-Z]+\)', n)
if m is not None:
url_new = url[8:]
command = "host " + url_new
process = os.popen(command)
results = str(process.read())
marker = results.find("has address") + 12
return results[marker:].splitlines()[0]
print(get_ip_address("https://www.yahoo.com"))
The complete program for web crawling looks like this:
#!/usr/bin/python3
from general import *
from domain_name import *
from ip_address import *
from nmap import *
from robots_txt import *
from whois import *
ROOT_DIR = "companies"
create_dir(ROOT_DIR)
def gather_info(name, url):
domain_name = get_domain_name(url)
ip_address = get_ip_address(url)
nmap = get_nmap('-F', ip_address)
robots_txt = get_robots_txt(url)
whois = get_whois(domain_name)
create_report(name, url, domain_name, nmap, robots_txt, whois, ip_address)
def create_report(name, full_url, domain_name, nmap, robots_txt, whois, ip_address):
project_dir = ROOT_DIR + '/' + name
create_dir(project_dir)
write_file(project_dir + '/full_url.txt', full_url)
write_file(project_dir + '/domain_name.txt', domain_name)
write_file(project_dir + '/nmap.txt', nmap)
write_file(project_dir + '/robots_txt.txt', robots_txt)
write_file(project_dir + '/whois.txt', whois)
write_file(project_dir + '/ip_address.txt', ip_address)
x = input("Enter the Company Name: ")
y = input("Enter the complete url of the company: ")
gather_info( x , y )
The input entered looks like this:
root@nitin-Lenovo-G580:~/Desktop/web_scanning# python3 main.py
106.10.138.240
Enter the Company Name: Yahoo
Enter the complete url of the company: https://www.yahoo.com/
/bin/sh: 1: Syntax error: "(" unexpected
And the output in ip_address.txt is:
hoo.com/ not found: 3(NXDOMAIN)
The program as seen runs well during runtime and gives ip as 106.10.138.240 still saving something different in ip_address.txt Also I failed to find out how this /bin/sh syntax error came. Please help me...