0

Is there a way to extract "Part-time, Full-time" and "On Campus" from the span tag?

<span class="SecondaryFacts DesktopOnlyBlock" data-v-3c87c7ca="">Master <span class="Divider" data-v-3c87c7ca="">/</span> Part-time, Full-time <span class="Divider" data-v-3c87c7ca="">/</span> On Campus</span>

I would be able to locate the span tag through class="Divider" but get text "/". Is there a way to get text after the inner span closes?

bilbao
  • 13
  • 4

2 Answers2

0

The XML in your example has a root element which is a span element containing the following child nodes:

  • the text node Master
  • a span element
  • the text node Part-time, Full-time
  • another span element
  • the text node On Campus

You say you want to extract the text nodes Part-time, Full-time and On Campus? Presumably you want an XPath that you can apply to other similar XML data, and there are different criteria that could return you those same two text nodes. So I'm going to guess that your criteria are you that you want to extract any text node which is preceding by a sibling span element whose class attribute is Divider. The appropriate XPath would be:

/span/text()[preceding-sibling::span/@class='Divider']

That said, I suspect the ElementTree XPath interface may not work for you, because it doesn't support XPath queries that return text nodes, only elements (that's what I understand, anyway; I'm not a Python programmer). However, I know that the XPath API of lxml.etree will return text nodes, e.g. https://lxml.de/tutorial.html#using-xpath-to-find-text

Conal Tuohy
  • 2,561
  • 1
  • 8
  • 15
0

Here is the code that will return what you need. The values you are looking for do not belong to the 'span' tag, you can use the search for find('body'), or find_all() and refer to the first element found.

from bs4 import BeautifulSoup


html = '<span class="SecondaryFacts DesktopOnlyBlock" data-v-3c87c7ca="">Master <span class="Divider" data-v-3c87c7ca="">/</span> Part-time, Full-time <span class="Divider" data-v-3c87c7ca="">/</span> On Campus</span>'
soup = BeautifulSoup(html, "lxml")
# body = soup.find('body')
body = soup.find_all()[0]
print(body.text.split("/")[1:])

We will get a list that you can process as you need:

[' Part-time, Full-time ', ' On Campus']
user510170
  • 286
  • 2
  • 5