-5

I need a regex to extract a each paragraph and store as a string for additional processing from the text buffer containing many such similar paragraphs.

Example: Say, the text buffer is like this:

===  Jun 11 14:05:39 - Person Details  ===

Person Name = "Hurlman"

Person Address = "2nd Street Benjamin Blvd NJ"

Persion Age = 25

===  Jun 11 14:05:39 - Person Details  ===

Person Name = "Greg"

Person Address = "3rd Street Benjamin Blvd NJ"

Persion Age = 26


===  Jun 11 14:05:42 - Person Details  ===

Person Name = "Michel"

Person Address = "4th Street Benjamin Blvd NJ"

Persion Age = 27

And I need to iterate through all the paragraphs and store each one of them to further find the specific person details inside.

Each paragraph I need to extract should be of the below format

===  Jun 11 14:05:42 - Person Details  ===

Person Name = "Michel"

Person Address = "4th Street Benjamin Blvd NJ"

Persion Age = 27

Any help is much appreciated!

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140

2 Answers2

1

you could use this pattern (===.*===[\s\S]*?)(?====|$)
Demo

alpha bravo
  • 7,838
  • 1
  • 19
  • 23
0

Using regexes to solve this is possible, but it is likely to give you a poor (inefficient, hard to understand, hard to maintain, etc) solution.

What you have is an informal record structure represented using lines of text. (This is not natural language text, so describing it in terms of "paragraphs" doesn't make sense.)

The way to process it is to read it a line at a time and then use Scanner (or equivalent) to parse each line into name value pairs. You just need some simple logic to detect the record boundaries and / or check that they are appearing at the correct place in the input stream.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Since the input to me will be such paragraphs where in each signifies a specific unit of information, I am extracting them as paragraph and then needs to apply pattern or scanner classes to operate on the paragraph. Thanks for your reply. – user3741466 Jun 15 '14 at 17:01