1

Is it possible to segment a bs4.element.Tag into several bs4.element.Tag?

You can think of an application as the following:

1- The original bs4.element.Tag contains a paragraph.

2- We want to segment the paragraph in the original bs4.element.Tag into sentences and get a bs4.element.Tag corresponding to each sentence.

Example:

paragraphs = soup.find_all('p') gives all the paragraphs in an HTML file.

Suppose a paragraph (which is also a bs4.element.Tag instance) is the following:

<p><i><a href="/wiki/Le_Bassin_Aux_Nymph%C3%A9as" title="Le Bassin Aux Nymphéas">Le Bassin Aux Nymphéas</a></i>, 1919. Monet's late series of water lily paintings are among his best-known works.

I would like to turn this bs4.element.Tag instance (which is also a paragraph) into 2 bs4.element.Tag instances as the following (one for each sentence):

First bs4.element.Tag should correspond to the first sentence:

<i><a href="/wiki/Le_Bassin_Aux_Nymph%C3%A9as" title="Le Bassin Aux Nymphéas">Le Bassin Aux Nymphéas</a></i>, 1919.

Second bs4.element.Tag should correspond to the second sentence:

Monet's late series of water lily paintings are among his best-known works.
A.M.
  • 1,757
  • 5
  • 22
  • 41

0 Answers0