-1

I want to crawl a PHP based website it have a search box we can enter a number in that search box, it renders a the result against the entered number when we click submit button or hit enter but URL does not change. like its showing foo.com/res_17.php for every result but for crawling like more than thousand records, records should be accessible by unique IDs such as foo.com/res_17.php?id=1001, foo.com/res_17.php?id=1002 - foo.com/res_17.php?id=3450 so that i can access them using while loop how can I do this any solution please help.

Mehar G
  • 41
  • 2

1 Answers1

0

I gave you one my script

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("http://en.wikipedia.org/wiki/Andrew_Ng")
bsObj = BeautifulSoup(html)

for link in bsObj.find("div", {"id":"bodyContent"}).findAll("a",
            href=re.compile("^(/wiki/)((?!:).)*$")):
    if 'href' in link.attrs:
        print(link.attrs['href'])

Output is presented as all Andrew Ng Wikipedia's articles.

MishaVacic
  • 1,812
  • 8
  • 25
  • 29