I want to crawl a PHP based website it have a search box we can enter a number in that search box, it renders a the result against the entered number when we click submit button or hit enter but URL does not change. like its showing foo.com/res_17.php for every result but for crawling like more than thousand records, records should be accessible by unique IDs such as foo.com/res_17.php?id=1001, foo.com/res_17.php?id=1002 - foo.com/res_17.php?id=3450 so that i can access them using while loop how can I do this any solution please help.
Asked
Active
Viewed 139 times
-1
-
What is your problem? – Kalle Richter Jun 30 '17 at 11:08
-
http://fbise.edu.pk/res-ssc-II.php on this website results for roll# 100001-143293 are available how to crawl them...? – Mehar G Jul 01 '17 at 06:51
1 Answers
0
I gave you one my script
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("http://en.wikipedia.org/wiki/Andrew_Ng")
bsObj = BeautifulSoup(html)
for link in bsObj.find("div", {"id":"bodyContent"}).findAll("a",
href=re.compile("^(/wiki/)((?!:).)*$")):
if 'href' in link.attrs:
print(link.attrs['href'])
Output is presented as all Andrew Ng Wikipedia's articles.

MishaVacic
- 1,812
- 8
- 25
- 29