re.findall returns a list of tuples that containing the expected strings and also something unexpected.
I was conducting a function findtags(text)
to find tags
in a given paragraph text
. When I called re.findall(tags, text)
to find defined tags in the text, it returns a list of tuple. Each tuple in the list contains the string that I expected it to return.
The function findtags(text)
is as follows:
import re
def findtags(text):
parms = '(\w+\s*=\s*"[^"]*"\s*)*'
tags = '(<\s*\w+\s*' + parms + '\s*/?>)'
print(re.findall(tags, text))
return re.findall(tags, text)
testtext1 = """
My favorite website in the world is probably
<a href="www.udacity.com">Udacity</a>. If you want
that link to open in a <b>new tab</b> by default, you should
write <a href="www.udacity.com"target="_blank">Udacity</a>
instead!
"""
findtags(testtext1)
The expected result is
['<a href="www.udacity.com">',
'<b>',
'<a href="www.udacity.com"target="_blank">']
The actual result is
[('<a href="www.udacity.com">', 'href="www.udacity.com"'),
('<b>', ''),
('<a href="www.udacity.com"target="_blank">', 'target="_blank"')]