-3

I have a list of item like this: (the number of item lists can vary)

<h3>My title</h3>
<a href="http://myurl.com">http://myurl.com</a>
<span class="t">text</span>

<h3>My title</h3>
<a href="http://myurl.com">http://myurl.com</a>
<span class="t">text</span>

...

How with beautiful soup I could get all these data so I can put all of them is a list to have a result like this : [{'title': h3, 'url': url, 'title': title}, {'title': h3, 'url': url, 'title': title}, ...] ?

thank you

uskap
  • 71
  • 1
  • 8

1 Answers1

0

You could iterate over your HTML's contents like so (assuming that your data is held in html_data):

import bs4

soup = BeautifulSoup(html_data)
my_list = []
for i in range(len(soup.body.contents), step=3):
    my_list.append({'title1': soup.body.contents[i], 'url': soup.body.contents[i+1], 'title2': soup.body.contents[i+2]})

This of course only works under the premise that your data resides on the same level and is not nested in any way. If it's not, then you should post a valid chunk of your test data and its structure.

Arno-Nymous
  • 495
  • 3
  • 14