BeautifulSoup output stays []

Question

I am trying to scrape the texts from a website with BeautifulSoup + python requests. But it is only getting [] as an output.

from bs4 import BeautifulSoup
import requests

url = "http://nos.nl/artikel/2093082-steeds-meer-nekklachten-bij-kinderen-door-gebruik-tablets.html"
r  = requests.get(url)

soup = BeautifulSoup(r.content)

data = soup.find_all("div", {"class": "article_title"})

print data

output:

[]

I've tried;

> data = soup.find_all("div", {"class": "article_title"}
> data = soup.find_all("div", class_="article_title") data =
> data = soup.find_all("div", class_="article")

What am I doing wrong?

Are you sure that class exists on the site? Did you check the HTML? — AlG, Mar 16 '16 at 11:29

score 2 · Accepted Answer · answered Mar 16 '16 at 11:38

2

There are two problems:

The tag used on the site is a h1, not a div.
The class name is article__title (that's two underscores!).

So what you want, is:

data = soup.find_all("h1", {"class": "article__title"})

Which gives us:

[<h1 class="article__title">Steeds meer nekklachten bij kinderen door gebruik tablets</h1>]

I used my Firefox web inspector to quickly get this information, by the way ;-) Chrome, Internet Explorer, Safari, and all other browsers that I know have similar tools built-in. I strongly suggest you learn to use at least the basics of them, because it'll make your life a whole lot easier!

answered Mar 16 '16 at 11:38

Martin Tournoij

26,737
24
105
146

Damn I should have checked that first, sorry and thanks! – Danisk Mar 16 '16 at 12:00
I want to save my content as a .txt file but I'm stuck again, could you maybe help me? – Danisk Mar 16 '16 at 14:40
@Danisk If it's a different question, then you should ask a different question about it ;-) You should also mark this answer as accepted if it fixed your problem, by the way :-) That way other folk can see it already has a fix and I get points for it! (woohoo!) – Martin Tournoij Mar 16 '16 at 14:43

score 0 · Answer 2 · answered Mar 16 '16 at 11:43

0

First problem is that there isn't a article_title tag in the website. If you use article__title (two underscores) it will return something, because that is a tag. Look through the html source to see what tags actually exist!

answered Mar 16 '16 at 11:43

Luis F Hernandez

891
2
14
29

BeautifulSoup output stays []

2 Answers2