Sort of an open-ended question here. I needed to go across a Job site and search for a Job Description tag and a Skill requirement (I'm done with this). I basically wanted to know, how do I crawl across the site? As in, go from test.com to test.com/a and so on....?? Basically, crawl the page.
This is my code to search within the page. I need to find all the possible such pages in the site and get the link. THIS IS NOT HOMEWORK. I'm just doing this on the side...
import urllib2
import re
html_content = urllib2.urlopen('http://www.ziprecruiter.com/job/Systems- Engineer/b5452eab/?source=customer-cpc-indeed').read()
matchDescription = re.findall('Bachelor', html_content);
matchSkill = re.findall('VMware', html_content);
print matchDescription
print matchSkill
if ( len(matchDescription) and len(matchSkill) )== 0:
print 'I did not find anything'
else:
print 'My string is in the html'