How can I get namespace information from tag in beautifulsoup4?

Question

I am trying to parse some xml files that strongly make use of namespaces. Right now I am using beautifulsoup4 and for the most part things are going well. Unfortunately I am running into some data where it is possible that some tags may have the same name, but they have a different namespace specifier so in theory this should be fine as beautiful soup clearly has this information at some level:

from bs4 import BeautifulSoup

xml = """

<root
xmlns:nsa="http://www.dummynamespacea.com"
xmlns:nsb="http://www.dummynamespaceb.com"
>
<nsa:elem>information</nsa:elem1>
<nsb:elem>more information</nsb:elem2>

</root>

"""

soup = BeautifulSoup(xml, "xml")

print(soup)

<?xml version="1.0" encoding="utf-8"?>
<root xmlns:nsa="http://www.dummynamespacea.com" xmlns:nsb="http://www.dummynamespaceb.com">
<nsa:elem>information</nsa:elem>
<nsb:elem>more information</nsb:elem>
</root>

However if I print the name from the elements as I iterate over them, that information is not there:

import re
for element in soup.find_all(re.compile(".*")):
    print(element.name)

root
elem
elem

Is there a way to get information about the tag's namespace as I iterate over them?

score 3 · Accepted Answer · answered Feb 13 '21 at 02:55

3

What you are looking for is the .namespace or '.prefix' attribute of element:

for element in soup.find_all(re.compile(".*")):
    print(element.prefix, element.name)

None root
nsa elem
nsb elem

answered Feb 13 '21 at 02:55

VirtualScooter

1,792
3
18
28

Honestly thank you so much. For some reason I struggled finding this in the documentation or anywhere else online. – jammertheprogrammer Feb 13 '21 at 05:18

How can I get namespace information from tag in beautifulsoup4?

1 Answers1