Python - BeautifulSoup - How to return two different elements or more, with different attributes?

Question

HTML Exemple

<html>
<div book="blue" return="abc">
<h4 class="link">www.example.com</h4>
<p class="author">RODRIGO</p>
</html>

Ex1:

url = urllib.request.urlopen(url) 
page_soup = soup(url.read(), "html.parser")  
res=page_soup.find_all(attrs={"class": ["author","link"]}) 
for each in res: 
print(each)

Result1:

www.example.com RODRIGO

Ex2:

url = urllib.request.urlopen(url) 
page_soup = soup(url.read(), "html.parser")  
res=page_soup.find_all(attrs={"book": ["blue"]}) 
for each in res: 
print(each["return")

Result 2:

abc

!!!puzzle!!!

The question I have is how to return the 3 results in a single query?

Result 3

www.example.com RODRIGO abc

Hi, is `
` a valid html tag? doesn't require it a closing tag? — cards, Apr 30 '22 at 14:02

HedgeHog · Answer 1 · 2022-04-30T12:53:43.800

Example HTML seems to be broken - Assuming the div wrappes the other tags and it is may not the only book you can select all books:

for e in soup.find_all(attrs={"book": ["blue"]}):
    print(' '.join(e.stripped_strings),e.get('return'))

Example

from bs4 import BeautifulSoup

html = '''
<html>
<div book="blue" return="abc">
<h4 class="link">www.rodrigo.com</h4>
<p class="author">RODRIGO</p>
</html>
'''

soup = BeautifulSoup(html)

for e in soup.find_all(attrs={"book": ["blue"]}):
    print(' '.join(e.stripped_strings),e.get('return'))

Output

www.rodrigo.com RODRIGO abc

A more structured example could be:

data = []
for e in soup.select('[book="blue"]'):
    data.append({
        'link':e.h4.text,
        'author':e.select_one('.author').text,
        'return':e.get('return')
    })
data

Output:

[{'link': 'www.rodrigo.com', 'author': 'RODRIGO', 'return': 'abc'}]

Extract All links import requests from bs4 import BeautifulSoup url = 'https://mixkit.co/free-stock-music/hip-hop/' reqs = requests.get(url) soup = BeautifulSoup(reqs.text, 'html.parser') urls = [] for link in soup.find_all('a'): print(link.get('href')) — Rodrigo Moraes, May 31 '22 at 12:00

cards · Answer 2 · 2022-04-30T15:18:03.467

For the case one attribute against many values a regex approach is suggested:

from bs4 import BeautifulSoup
import re

html = """<html>
<div book="blue" return="abc">
<h4 class="link">www.rodrigo.com</h4>
<p class="author">RODRIGO</p>
</html>"""


soup = BeautifulSoup(html, 'lxml')

by_clss = soup.find_all(class_=re.compile(r'link|author'))
print(b_clss)

For more flexibility, a custom query function can be passed to find or find_all:

from bs4 import BeautifulSoup

html = """<html>
<div href="blue" return="abc"></div> <!-- div need a closing tag in a html-doc-->
<h4 class="link">www.rodrigo.com</h4>
<p class="author">RODRIGO</p>
</html>"""


def query(tag):
    if tag.has_attr('class'):
        # tag['class'] is a list. Here assumed that has only one value
        return set(tag['class']) <= {'link', 'author'}
    if tag.has_attr('book'):
        return tag['book'] in {'blue'}
    return False


print(soup.find_all(query))
# [<div book="blue" return="abc"></div>, <h4 class="link">www.rodrigo.com</h4>, <p class="author">RODRIGO</p>]

Notice that your html-sample has no closing div-tag. In my second case I added it otherwise the soup... will not taste good.

EDIT To retrieve elements which satisfies a simultaneous conditions on attributes the query could look like this:

def query_by_attrs(**tag_kwargs):
    # tag_kwargs: {attr: [val1, val2], ...}
    def wrapper(tag):
        for attr, values in tag_kwargs.items():
            if tag.has_attr(attr):
                # check if tag has multi-valued attributes (class,...)
                if not isinstance((tag_attr:=tag[attr]), list): # := for python >=3.8
                    tag_attr = (tag_attr,) # as tuple
                return bool(set(tag_attr).intersection(values)) # false if empty set

    return wrapper


q_data = {'class': ['link', 'author'], 'book': ['blue']}
results = soup.find_all(query_by_attrs(**q_data))
print(results)

score 0 · Answer 3 · answered May 31 '22 at 12:01

0

Extract All link from WebSite

import requests
from bs4 import BeautifulSoup
url = 'https://mixkit.co/free-stock-music/hip-hop/'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'html.parser')
urls = []
for link in soup.find_all('a'):
    print(link.get('href'))

answered May 31 '22 at 12:01

Rodrigo Moraes

1
2

1

Kindly add a line or two to explain what you are doing. This will help the readers to understand the answer more clearly. – Abhyuday Vaish May 31 '22 at 14:04

Python - BeautifulSoup - How to return two different elements or more, with different attributes?

3 Answers3

Example

Output