Is there a way in Python to do paragraph segmentation based on topic of short texts that were created by speech-to-text?

Question

I have multiple transcripts of short vidoes, that were created by speech-to-text algorithm. I want to segment these transcripts into paragraphs, based on their content. I tried to use Texttilling in Python but for every such trial I got the "No paragraph breaks were found(text too short perhaps?)" error. I'm trying to see if there are other packages that can work with short transcripts and specifically for "spoken text" pieces. I also tried to manipulate the TextTilingTokenizer arguments but got the same error over and over again.

Welcome to StackOverflow. Please follow the posting guidelines in the help documentation, as suggested when you created this account. [On topic](https://stackoverflow.com/help/on-topic), [how to ask](https://stackoverflow.com/help/how-to-ask), and ... [the perfect question](https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/) apply here. StackOverflow is not a design, coding, research, or tutorial resource. However, if you follow whatever resources you find on line, make an honest solution attempt, and run into a problem, you'd have a good example to post. — Prune, Oct 22 '19 at 20:55
Something like this https://github.com/koomri/text-segmentation should work — Nikolay Shmyrev, Oct 22 '19 at 21:45
LexPredict has paragraph segmentation https://lexpredict-lexnlp.readthedocs.io/en/docs-0.1.6/modules/nlp_en_segments_paragraphs.html#functions — Louis Maddox, Sep 26 '22 at 08:31

Is there a way in Python to do paragraph segmentation based on topic of short texts that were created by speech-to-text?

0 Answers0