I am trying to parse this website: http://www.takeuchi-cycle.com/category/激安中古自転車情報/
import bs4
import requests
target_url = "http://www.takeuchi-cycle.com/category/激安中古自転車情報/"
response = requests.get(target_url)
response.raise_for_status()
I then soupify it:
soup = bs4.BeautifulSoup(response.text)
However, when I try to do soup.find("div", {"id": "maincolumn"})
, I get the following:
<div id="maincolumn">
<div class="post">
<h2>Not Found</h2>
<div class="entry">
<p>Sorry, but you are looking for something that isn't here.</p>
</div>
</div>
</div
This bit appears neither on the website nor in soup.prettify(). Instead, all the listings appear. soup.select("div")
and soup.findAll("div", class_="post")
return similar things too. html5lib
does not work, and I cannot access lxml.parser
even though I have lxml installed.
What is causing this, and what can I do to make it work?
edit: Changing the URL to http://www.takeuchi-cycle.com/category/%E6%BF%80%E5%AE%89%E4%B8%AD%E5%8F%A4%E8%BB%8A%E6%83%85%E5%A0%B1/ makes it work. But the content for both is identical except for the title. The first one says "page not found", I guess because it couldn't find the useragent, while the other says "For PC." The question remains, though.