I am trying to parse webpages using urllib2, BeautifulSoup and Python 2.7.
The problem lies upstream: each time I try to retrieve a new webpage, I get the one I already retrieved. However, pages are different in my webbrowser: see page 1 and page 2. Is there something wrong with the loop over page numbers?
Here is a code sample:
def main(page_number_max):
import urllib2 as ul
from BeautifulSoup import BeautifulSoup as bs
base_url = 'http://www.senscritique.com/clement/collection/#page='
for page_number in range(1, 1+page_number_max):
url = base_url + str(page_number) + '/'
html = ul.urlopen(url)
bt = bs(html)
for item in bt.findAll('div', 'c_listing-products-content xl'):
item_name = item.findAll('h2', 'c_heading c_heading-5 c_bold')
print str(item_name[0].contents[1]).split('\t')[11]
print('End of page ' + str(page_number) + '\n')
if __name__ == '__main__':
page_number_max = 2
main(page_number_max)