EDIT: From the comment "I want to return text which is not between any <div>
and </div>
tags.". This should strip out any blocks where a parent has a div tag:
raw = '''
<html>
Text <div> Avoid this </div>
<p> Nested <div> Don't get me either </div> </p>
</html>
'''
def check_for_div_parent(mark):
mark = mark.parent
if 'div' == mark.name:
return True
if 'html' == mark.name:
return False
return check_for_div_parent(mark)
soup = bs4.BeautifulSoup(raw)
for text in soup.findAll(text=True):
if not check_for_div_parent(text):
print text.strip()
This results in only two tags, ignore the div ones:
Text
Nested
Original response
It's unclear what you are trying to do exactly. First up, you should try to post a full working example as you seem to be missing your headers. Secondly, Wikipedia seems to have a stance against "bots" or automated downloaders
Python's `urllib2`: Why do I get error 403 when I `urlopen` a Wikipedia page?
This can be avoided with the following lines of code
import urllib2, bs4
url = r"http://en.wikipedia.org/wiki/Viscosity"
req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"})
con = urllib2.urlopen( req )
Now that we have the page, I think you just want to extract the main text using bs4
. I would do something like this
soup = bs4.BeautifulSoup(con.read())
start_pos = soup.find('h1').parent
for p in start_pos.findAll('p'):
para = ''.join([text for text in p.findAll(text=True)])
print para
This gives me text that looks like:
The viscosity of a fluid is a measure of its resistance to gradual deformation by shear stress or tensile stress. For liquids, it corresponds to the informal notion of "thickness". For example, honey has a higher viscosity than water.[1]
Viscosity is due to friction between neighboring parcels of the fluid that are moving at different velocities. When fluid is forced through a tube, the fluid generally moves faster near the axis and very slowly near the walls, therefore some stress (such as a pressure difference between the two ends of the tube) is needed to overcome the friction between layers and keep the fluid moving. For the same velocity pattern, the stress required is proportional to the fluid's viscosity. A liquid's viscosity depends on the size and shape of its particles and the attractions between the particles.[citation needed]