How do I search for specific text once I have scraped a website using python?

Question

I scraped data from a Wikipedia page and am now trying to create a function that tells me if a phrase is within the results. When I do the command print(soup) I can clearly see that the word "December" is in the content but when I run the phrase through my function, it says it's not there. I have listed my code below and I can't figure out why it's not showing up. Any help would be greatly appreciated.

req = requests.get("https://en.wikipedia.org/wiki/Christmas")
soup = BeautifulSoup(req.content, "html.parser")
def function(x):
    if x in soup:
        print("here")
    else:
        print("not here")

function("December")

i understood that u want to check if some text is in a paragraph tag of the website, am i right? — , Aug 12 '21 at 04:23
@Dev Yes. Ultimately, if the phrase I am looking for is in a reference at the bottom, I want to be able to download the reference link. — missatomicbomb, Aug 12 '21 at 04:26
Unless you have a specific reason to use bs4, `if {string} in req.text` will work — Atralupus, Aug 12 '21 at 04:31

ql.user2511 · Accepted Answer · 2021-08-12T05:26:03.157

I would have used the text.strip function to first convert the soup to string, because we should bear in mind that soup is in the format bs4.BeautifulSoup. As such, a string like "December" cannot be located from the content.

Please find below my take on it:

import bs4
import urllib
from bs4 import BeautifulSoup as bs
from urllib.request import urlopen

page = urlopen("https://en.wikipedia.org/wiki/Christmas")
soup = bs(page, 'html.parser')

def function(x):
    if x in soup.text.strip():
        print("here")
    else:
        print("not here")

function("December")

which returns:

here

score 0 · Answer 2 · edited Aug 12 '21 at 06:21

0

Change

if x in soup:

to

if x in soup.decode():

edited Aug 12 '21 at 06:21

imxitiz

3,920
3
9
33

answered Aug 12 '21 at 05:32

Suneesh Jacob

806
1
7
15

score 0 · Answer 3 · answered Aug 12 '21 at 06:24

0

(As per your reply to my comment)To search for some text in the website you can use this code:

from urllib.request import urlopen
def search(x,link='http://info.cern.ch/'): #default is the first website's link
    source=urlopen(link).read().decode()
    if x in source:
        print('Your text is in the website')

answered Aug 12 '21 at 06:24

srry for late reply – Aug 12 '21 at 06:24

How do I search for specific text once I have scraped a website using python?

3 Answers3