Can't get string inside ATag

Question

I am a beginner, so please be kind. I'm using Beautiful Soup to parse through some html. I have gotten to where i found this a tag

a_tag = <a href="sicc2020/results?pid=31022">S<span class="notCompact">hakira</span> Mirfin</a>

I would like to get "S" "hakira" and "Mirfin" out of this string. However when I use the .string function, it just says none. I can get the 'hakira' part, but i can't get the "S" or "Mirfin".

print(a_tag)
>><a href="sicc2020/results?pid=31022">S<span class="notCompact">hakira</span> Mirfin</a>

print(a_tag).string
>> None

print(a_tag).find('span').string
>>hakira

Any help would be very appreciated!

Thank you.

score 1 · Answer 1 · answered Jun 16 '20 at 14:59

1

You can try it:

from bs4 import BeautifulSoup
html_doc="""<a href="sicc2020/results?pid=31022">S<span class="notCompact">hakira</span> Mirfin</a>"""

soup = BeautifulSoup(html_doc, 'lxml')
text = soup.find("a").get_text(",", strip=True)

print(text)

Output will be:

S,hakira,Mirfin

answered Jun 16 '20 at 14:59

Humayun Ahmad Rajib

1,502
1
10
22

Thank you! get_text worked! didn't realize there was a difference between that and .string. thank you again!! – Newbie88 Jun 16 '20 at 15:09

Jonathan Delean · Answer 2 · 2020-06-16T14:50:30.903

Just do that :

var text_array;
var children = document.getElementById(id).childNodes;

text_array.push(document.getElementById(id).textContent)

  for (var i = 0; i < children.length; i++) {
    text_array.push(children[i].textContent)
  }

if you want to remove all the content :

var children = document.getElementById(id).childNodes;

document.getElementById(id).textContent = ""

  for (var i = 0; i < children.length; i++) {
    children[i].textContent = ""
  }

If it's dont work for your "S" and "Mirfin", you can do that :

$("#id")
.clone()    //clone the element
.children() //select all the children
.remove()   //remove all the children
.end()  //again go back to selected element
.text();

score 0 · Answer 3 · answered Jun 16 '20 at 22:13

Another method.

from simplified_scrapy import SimplifiedDoc,req,utils
html ='''<a href="sicc2020/results?pid=31022">S<span class="notCompact">hakira</span> Mirfin</a>'''
doc = SimplifiedDoc(html)
print (doc.a.text)

Result:

Shakira Mirfin

Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

Can't get string inside ATag

3 Answers3