i have recently found a very neat way of web scraping using bs4 that has a really nice organized structure to it. let us say this is our html code:
<div class="a">
<div class="b">
<a href="www.yelloaes.com">'hi'</a>
</div>
<div class ="c">
<p><a href="www.bb.com">'hi again'</a></p>
<div class="d">
<p>'well this final'</p>
</div>
</div>
</div>
<div class="a">
<div class="b">
<a href="www.yelloaes1.com">'hi1'</a>
</div>
<div class ="c">
<p><a href="www.bb1.com">'hi again1'</a></p>
<div class="d">
<p>'well this final1'</p>
</div>
</div>
</div>
now i am assuming <div class="a">
is our parent tag and we will suck info out of this tag, now that means i have to loop through this to extract info from all the page .
but because i was having a hard time understanding BeautifulSoup i did a test run with a python code to extract the info from the first iteration of this <div class= "a">
my code is like this :
soup = BeautifulSoup(r.text)
find_hi = soup.find('div',{'class':'a'}).div.text
find_hi-again =soup.find('div',{'class':'a'}).find_all('div')[1].p.text
find_final =soup.find('div',{'class':'a'}).find('div',{'class':'d'}).text
print(find_hi , find_hi-again , find_final)
#output comes as (it worked !!!)
hi , hi again , this is final
Note: I really want to stick with this one so please no completely new ways of scraping. now i can't seem to loop on all the page . i tried this for looping but does not show the result i want to see:
soup = BeautifulSoup(r.text)
#To have a list of all div tags having this class
scraping = soup.find_all('div',{'class':'a'})
for i in scraping:
find_hi = i.div.text
find_hi-again =i.find_all('div')[1].p.text
find_final =i.find('div',{'class':'d'}).text
print(find_hi , find_hi-again , find_final)
please help in looping ?