I use python3.5 and window10.
When I crawl some pages, I usually used url changes using urlopen and 'for' iteration. like below code.
from bs4 import BeautifulSoup
import urllib
f = open('Slave.txt','w')
for i in range(1,42):
html = urllib.urlopen('http://xroads.virginia.edu/~hyper/JACOBS/hjch'+str(i)+'.htm')
soup = BeautifulSoup(html,"lxml")
text = soup.getText()
f.write(text.encode("utf-8"))
f.close()
But, I am in trouble because there is no change in url, although I clicked next pages and web contentes were changed, like picture. there is no change in url and no pattern. enter image description here
There is no signal in url that i can catch the websites change.
http://eungdapso.seoul.go.kr/Shr/Shr01/Shr01_lis.jsp
The web site is here The clue I found was in pagination class. I found some links to go next pages, but i don't know how can i use this link in Beautifulsoup. I think commonPagingPost is defined function by developer.
<span class="number"><a href="javascript:;"
class="on">1</a>
<a href="javascript:commonPagingPost('2','10','Shr01_lis.jsp');">2</a>
<a href="javascript:commonPagingPost('3','10','Shr01_lis.jsp');">3</a>
<a href="javascript:commonPagingPost('4','10','Shr01_lis.jsp');">4</a>
<a href="javascript:commonPagingPost('5','10','Shr01_lis.jsp');">5</a></span>
how can I open or crawl all these site using beutifulSoup4? I just get fisrt pages when i use urlopen.