Python extracts texts following a span tag inside another span tag

Question

Is there a way to extract "Part-time, Full-time" and "On Campus" from the span tag?

<span class="SecondaryFacts DesktopOnlyBlock" data-v-3c87c7ca="">Master <span class="Divider" data-v-3c87c7ca="">/</span> Part-time, Full-time <span class="Divider" data-v-3c87c7ca="">/</span> On Campus</span>

I would be able to locate the span tag through class="Divider" but get text "/". Is there a way to get text after the inner span closes?

Show what you have tried so far, and the exact output it gives, and what you want it to give. — ProfDFrancis, Mar 07 '23 at 06:26

score 0 · Answer 1 · answered Mar 07 '23 at 06:46

The XML in your example has a root element which is a span element containing the following child nodes:

the text node Master
a span element
the text node Part-time, Full-time
another span element
the text node On Campus

You say you want to extract the text nodes Part-time, Full-time and On Campus? Presumably you want an XPath that you can apply to other similar XML data, and there are different criteria that could return you those same two text nodes. So I'm going to guess that your criteria are you that you want to extract any text node which is preceding by a sibling span element whose class attribute is Divider. The appropriate XPath would be:

/span/text()[preceding-sibling::span/@class='Divider']

That said, I suspect the ElementTree XPath interface may not work for you, because it doesn't support XPath queries that return text nodes, only elements (that's what I understand, anyway; I'm not a Python programmer). However, I know that the XPath API of lxml.etree will return text nodes, e.g. https://lxml.de/tutorial.html#using-xpath-to-find-text

score 0 · Accepted Answer · answered Mar 07 '23 at 06:57

Here is the code that will return what you need. The values you are looking for do not belong to the 'span' tag, you can use the search for find('body'), or find_all() and refer to the first element found.

from bs4 import BeautifulSoup


html = '<span class="SecondaryFacts DesktopOnlyBlock" data-v-3c87c7ca="">Master <span class="Divider" data-v-3c87c7ca="">/</span> Part-time, Full-time <span class="Divider" data-v-3c87c7ca="">/</span> On Campus</span>'
soup = BeautifulSoup(html, "lxml")
# body = soup.find('body')
body = soup.find_all()[0]
print(body.text.split("/")[1:])

We will get a list that you can process as you need:

[' Part-time, Full-time ', ' On Campus']

Python extracts texts following a span tag inside another span tag

2 Answers2