I've created a script to parse two fields from every movie container from a webpage. The script is doing fine.
I'm trying to use this getattr()
function to scrape text and src from two fields, as in movie_name
and image_link
. In case of movie_name
, it works. However, it fails when I try to parse image_link
.
There is a function currently commented out which works when I uncomment. However, my goal here is to make use of getattr()
to parse src
.
import requests
from bs4 import BeautifulSoup
url = "https://yts.am/browse-movies"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
}
# def get_information(url):
# res = requests.get(url,headers=headers)
# soup = BeautifulSoup(res.text,'lxml')
# for row in soup.select(".browse-movie-wrap"):
# movie_name = row.select_one("a.browse-movie-title").text
# image_link = row.select_one("img.img-responsive").get("src")
# yield movie_name,image_link
def get_information(url):
res = requests.get(url,headers=headers)
soup = BeautifulSoup(res.text,'lxml')
for row in soup.select(".browse-movie-wrap"):
movie_name = getattr(row.select_one("a.browse-movie-title"),"text",None)
image_link = getattr(row.select_one("img.img-responsive"),"src",None)
yield movie_name,image_link
if __name__ == '__main__':
for items in get_information(url):
print(items)
How can I scrape
src
usinggetattr()
function?