Regex to extract a paragraph

Question

I need a regex to extract a each paragraph and store as a string for additional processing from the text buffer containing many such similar paragraphs.

Example: Say, the text buffer is like this:

===  Jun 11 14:05:39 - Person Details  ===

Person Name = "Hurlman"

Person Address = "2nd Street Benjamin Blvd NJ"

Persion Age = 25

===  Jun 11 14:05:39 - Person Details  ===

Person Name = "Greg"

Person Address = "3rd Street Benjamin Blvd NJ"

Persion Age = 26


===  Jun 11 14:05:42 - Person Details  ===

Person Name = "Michel"

Person Address = "4th Street Benjamin Blvd NJ"

Persion Age = 27

And I need to iterate through all the paragraphs and store each one of them to further find the specific person details inside.

Each paragraph I need to extract should be of the below format

===  Jun 11 14:05:42 - Person Details  ===

Person Name = "Michel"

Person Address = "4th Street Benjamin Blvd NJ"

Persion Age = 27

Any help is much appreciated!

Please provide your bank details so we know where to send you money for the opportunity to write this code. — zx81, Jun 15 '14 at 01:48
Sorry, this is another way of saying: can you show us what you have tried? — zx81, Jun 15 '14 at 01:50
So you want to create a new (different) string for each paragraph? — hwnd, Jun 15 '14 at 01:53

score 1 · Accepted Answer · answered Jun 15 '14 at 03:13

1

you could use this pattern (===.*===[\s\S]*?)(?====|$)
Demo

answered Jun 15 '14 at 03:13

alpha bravo

7,838
1
19
23

score 0 · Answer 2 · answered Jun 15 '14 at 02:49

Using regexes to solve this is possible, but it is likely to give you a poor (inefficient, hard to understand, hard to maintain, etc) solution.

What you have is an informal record structure represented using lines of text. (This is not natural language text, so describing it in terms of "paragraphs" doesn't make sense.)

The way to process it is to read it a line at a time and then use Scanner (or equivalent) to parse each line into name value pairs. You just need some simple logic to detect the record boundaries and / or check that they are appearing at the correct place in the input stream.

Since the input to me will be such paragraphs where in each signifies a specific unit of information, I am extracting them as paragraph and then needs to apply pattern or scanner classes to operate on the paragraph. Thanks for your reply. — user3741466, Jun 15 '14 at 17:01

Regex to extract a paragraph

2 Answers2