I am writing a python program for getting the ipaddress of the website by using socket module. Here, i have a list of dicts with n number of websites and numbers.
Here's some sample data:
data_list = [{'website': 'www.google.com', 'n': 'n1'}, {'website': 'www.yahoo.com', 'n': 'n2'}, {'website': 'www.bing.com', 'n': 'n3'}, {'website': 'www.stackoverflow.com', 'n': 'n4'}, {'website': 'www.smackcoders.com', 'n': 'n5'}, {'website': 'www.zoho.com', 'n': 'n6'}, {'website': 'www.quora.com', 'n': 'n7'}, {'website': 'www.elastic.co', 'n': 'n8'}, {'website': 'www.google.com', 'n': 'n9'}, {'website': 'www.yahoo.com', 'n': 'n10'}, {'website': 'www.bing.com', 'n': 'n11'}, {'website': 'www.stackoverflow.com', 'n': 'n12'}, {'website': 'www.smackcoders.com', 'n': 'n13'}, {'website': 'www.zoho.com', 'n': 'n14'}, {'website': 'www.quora.com', 'n': 'n15'}, {'website': 'www.elastic.co', 'n': 'n16'}, {'website': 'www.google.com', 'n': 'n17'}, {'website': 'www.yahoo.com', 'n': 'n18'}, {'website': 'www.bing.com', 'n': 'n19'}, {'website': 'www.stackoverflow.com', 'n': 'n20'}]
Here's my program:
import socket
import time
data_list = [{'website': 'www.google.com', 'n': 'n1'}, {'website': 'www.yahoo.com', 'n': 'n2'}, {'website': 'www.bing.com', 'n': 'n3'}, {'website': 'www.stackoverflow.com', 'n': 'n4'}, {'website': 'www.smackcoders.com', 'n': 'n5'}, {'website': 'www.zoho.com', 'n': 'n6'}, {'website': 'www.quora.com', 'n': 'n7'}, {'website': 'www.elastic.co', 'n': 'n8'}, {'website': 'www.google.com', 'n': 'n9'}, {'website': 'www.yahoo.com', 'n': 'n10'}, {'website': 'www.bing.com', 'n': 'n11'}, {'website': 'www.stackoverflow.com', 'n': 'n12'}, {'website': 'www.smackcoders.com', 'n': 'n13'}, {'website': 'www.zoho.com', 'n': 'n14'}, {'website': 'www.quora.com', 'n': 'n15'}, {'website': 'www.elastic.co', 'n': 'n16'}, {'website': 'www.google.com', 'n': 'n17'}, {'website': 'www.yahoo.com', 'n': 'n18'}, {'website': 'www.bing.com', 'n': 'n19'}, {'website': 'www.stackoverflow.com', 'n': 'n20'}]
field = "website"
action = "append"
max_retry = 1
hit_cache_size = 10
cache = []
d1 = []
for data in data_list:
temp={}
for item in data:
if item ==field:
if data[item]!="Not available":
try:
ad=socket.gethostbyname(data[item])
if len(cache)<hit_cache_size:
cache.append({data[item]:ad})
else:
cache=[]
if action=="replace":
temp[item]=ad
elif action=="append":
temp[item]=str([data[item],ad])
except:
count=0
while(True):
try:
ad=socket.gethostbyname(data[item])
except:
count+=1
if count==max_retry:
if action=="replace":
temp[item]="Unknown"
elif action=="append":
temp[item]=str([data[item],"Unknown"])
break
else:
continue
else:
temp[item]="Not available"
else:
temp[item]=data[item]
temp['timestamp']=time.ctime()
d1.append(temp)
print(d1)
Here, d
can have millions of websites. Due to this, my code takes more time. so i created a cache to store some websites
with their ip
there.The cache size is defined in hit_cache_size
. If the same website address comes in the list, instead of checking using the socket module, it should first check the cache. If the website address is there, it should get the ip from there and save it. I tried some ways by creating arrays. Eventhough it takes some time. How to make it possible.....