4

Is there a way to get all values of a certain attribute?

Example:

<a title="title-in-a">
  <b title="title-in-b"> ... </b>
  <c title="title-in-c"> ... </c>
  <d name="i-dont-care"> ... </d>
</a>

Can I get all titles, even if they are in different tags?

Expected result:

['title-in-a', 'title-in-b', 'title-in-c']

To get all titles in <a>, I know I can do this:

for item in soup.find_all('a'):
    print item['title']

But how to do it for all tags, even without knowing the tags?

klaus
  • 1,187
  • 2
  • 9
  • 19

4 Answers4

3

Assuming there's no error in your code (meaning that the <b> and <c> tags are enclosed within the <a> tag) then:

for i in soup4.find_all(title=True):
  print(i)

will output:

<a title="title-in-a">
<b title="title-in-b"> ... </b>
<c title="title-in-c"> ... </c>
...</a>
<b title="title-in-b"> ... </b>
<c title="title-in-c"> ... </c>

If, on the other hand, each tag is closed separately, such that the code is:

<a title="title-in-a">...</a>
<b title="title-in-b"> ... </b>
<c title="title-in-c"> ... </c>

the output is:

<a title="title-in-a">...</a>
<b title="title-in-b"> ... </b>
<c title="title-in-c"> ... </c>
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
  • This don't seem to work if there is a child without the `title` attribute. See the new example after my edit. – klaus May 10 '19 at 11:52
  • It's the same question, I just changed the example because I tried your answer and it worked only in the example, but not on a real case, in which there are many tags with multiple children. – klaus May 10 '19 at 17:45
3

Use an attribute selector.

titles = [item['title'] for item in soup.select('[title]')]
QHarr
  • 83,427
  • 12
  • 54
  • 101
1

Here is the solution for your use-case. There is one default method called attrs which will get all the attributes as dict {'name':'value'}

response = '''<a title="title-in-a">
  <b title="title-in-b"> ... </b>
  <c title="title-in-c"> ... </c>
  <d name="i-dont-care"> ... </d>
</a>'''
total_attributes = []
soup = BeautifulSoup(response,'lxml')
for tags in soup.find_all():
    attributes = tags.attrs
    #some filtering goes here 
    if attributes:
        required = list(attributes.values())
        total_attributes = total_attributes + required
print(total_attributes)

you can expect result like, also you can do filtering on highlighted place.

['title-in-a', 'title-in-b', 'title-in-c', 'i-dont-care']
Dhamodharan
  • 199
  • 10
0

Use python lambda function to search tag attribute title

from bs4 import BeautifulSoup

data='''<a title="title-in-a">
  <b title="title-in-b"> ... </b>
  <c title="title-in-c"> ... </c>
</a>'''

soup=BeautifulSoup(data,'html.parser')

for item in soup.find_all(lambda tag:[tag.attrs=='title']):
  print(item['title'])

Output:

title-in-a
title-in-b
title-in-c
KunduK
  • 32,888
  • 5
  • 17
  • 41
  • This only seem to work if all tags have attribute `title`. I added different tag to `data` like ` ... ` and it crashed. – klaus May 10 '19 at 11:45
  • yes you are right in that case css selector is the right choice. – KunduK May 10 '19 at 12:45