0

I scraped data from a Wikipedia page and am now trying to create a function that tells me if a phrase is within the results. When I do the command print(soup) I can clearly see that the word "December" is in the content but when I run the phrase through my function, it says it's not there. I have listed my code below and I can't figure out why it's not showing up. Any help would be greatly appreciated.

req = requests.get("https://en.wikipedia.org/wiki/Christmas")
soup = BeautifulSoup(req.content, "html.parser")
def function(x):
    if x in soup:
        print("here")
    else:
        print("not here")

function("December") 

3 Answers3

2

I would have used the text.strip function to first convert the soup to string, because we should bear in mind that soup is in the format bs4.BeautifulSoup. As such, a string like "December" cannot be located from the content.

Please find below my take on it:

import bs4
import urllib
from bs4 import BeautifulSoup as bs
from urllib.request import urlopen

page = urlopen("https://en.wikipedia.org/wiki/Christmas")
soup = bs(page, 'html.parser')

def function(x):
    if x in soup.text.strip():
        print("here")
    else:
        print("not here")

function("December")

which returns:

here

ql.user2511
  • 369
  • 2
  • 12
0

Change

if x in soup:

to

if x in soup.decode():
imxitiz
  • 3,920
  • 3
  • 9
  • 33
Suneesh Jacob
  • 806
  • 1
  • 7
  • 15
0

(As per your reply to my comment)To search for some text in the website you can use this code:

from urllib.request import urlopen
def search(x,link='http://info.cern.ch/'): #default is the first website's link
    source=urlopen(link).read().decode()
    if x in source:
        print('Your text is in the website')