0

I am trying to parse this website: http://www.takeuchi-cycle.com/category/激安中古自転車情報/

import bs4
import requests


target_url = "http://www.takeuchi-cycle.com/category/激安中古自転車情報/"
response = requests.get(target_url)
response.raise_for_status()

I then soupify it:

soup = bs4.BeautifulSoup(response.text)

However, when I try to do soup.find("div", {"id": "maincolumn"}), I get the following:

<div id="maincolumn">
    <div class="post">
        <h2>Not Found</h2>
        <div class="entry">
            <p>Sorry, but you are looking for something that isn't here.</p>
        </div>
    </div>
</div

This bit appears neither on the website nor in soup.prettify(). Instead, all the listings appear. soup.select("div") and soup.findAll("div", class_="post") return similar things too. html5lib does not work, and I cannot access lxml.parser even though I have lxml installed.

What is causing this, and what can I do to make it work?

edit: Changing the URL to http://www.takeuchi-cycle.com/category/%E6%BF%80%E5%AE%89%E4%B8%AD%E5%8F%A4%E8%BB%8A%E6%83%85%E5%A0%B1/ makes it work. But the content for both is identical except for the title. The first one says "page not found", I guess because it couldn't find the useragent, while the other says "For PC." The question remains, though.

Martin Gergov
  • 1,556
  • 4
  • 20
  • 29
Jean Nassar
  • 555
  • 1
  • 5
  • 13
  • Are you sure that `maincolumn` is not present because when I use `'maincolumn' in soup.prettify()` it returns `True`. And, I can also see it on the website. – AKS Apr 28 '16 at 05:05
  • That's why I asked this question. I see all the listings in `prettify`, and all the posts are populated. – Jean Nassar Apr 28 '16 at 05:08
  • [This post](http://stackoverflow.com/questions/27955978/python-requests-url-with-unicode-parameters) had similar issues. Please have a look. – AKS Apr 28 '16 at 05:33

0 Answers0