I have the following regex which will capture the first N words and finish at the next period, exclamation point or question mark. I need to get chunks of texts that vary in the number of words but I want complete sentences.
regex = (?:\w+[.?!]?\s+){10}(?:\w+,?\s+)*?\w+[.?!]
It works with the following text:
Therapy extract straw and chitosan from shrimp shells alone accounted for 2, 4, 6, 8 and 10% found that the extract straw 8% is highly effective in inhibiting the growth of algae Microcystis spp. The number of cells and the amount of chlorophyll a was reduced during treatment. Both value decreased continuous until the end of the trial.
https://regex101.com/r/ardIQ7/5
However it won't work with the following text:
Therapy extract straw and chitosan from shrimp shells alone accounted for 2, 4, 6, 8 and 10% found that the extract straw 8.2% is highly effective in inhibiting the growth of algae Microcystis spp. The number of cells and the amount of chlorophyll a was reduced during treatment. Both value decreased continuous until the end of the trial.
That is because of the digits (8.2%) with decimals and %.
I have been trying to figure out how to also capture these items but need some assistance to point me in the right direction. I don't just want to capture the first sentence. I want to capture N words which may include several sentences and returns complete sentences.