0

I have a string data (in txt file format) like below. I need to extract this into an array. I am not good at RegEx. So need your help to identify the expression to use.

Input: is below:

TABLENAME {
Type: DEPT;
Items: [
    0000=0000
    0001=0001
    0002=0002
    0010=0010
    0012=0012
    0020=0020
    ];
}

Expected output: An Array with 2 elements

1. Type:DEPT
2. Items:  [
    0000=0000
    0001=0001
    0002=0002
    0010=0010
    0012=0012
    0020=0020
    ];

The second element should also be converted to an array. Need to extract only the content bleow and I can then use a simple string.Split to extract the data I need.

    0000=0000
    0001=0001
    0002=0002
    0010=0010
    0012=0012
    0020=0020

Can someone pls help?

Uma Ilango
  • 968
  • 4
  • 16
  • 31

1 Answers1

1

I am not sure exactly what you are trying to accomplish and why you are getting this from a text file.

But it sounds like what you really need is to identify the Type and the Items in that object.

This could be done with the following regex, although you may need to modify it if you have some spaces/linebreaks that don't show in your current example:

\{\n?\s*Type\:\s*(?<Type>\w+);\n?\s*Items\:\s*\[\n*(?<Items>(?:\n?[\s]*[0-9=]+)+)[\n\s]*\];\n}

This will give you 2 named groups, one called Type and one called Items.
For you example above Type would contain DEPT and Items would contain the number pairs. But this is fairly adjusted to your example. I'm not sure how this would vary and if it's suited for your end goal.

You can play around with this on regex101 or a similar site to adjust the regex to suit your needs. I'm not sure how to explain the regex without breaking it down and giving you a long explanation, so let me know if you have any specific questions.

Edit: Added tablename to the capture groups. Will be inside a group called TableName. This will not allow spaces in the tablename. If you need spaces you could possibly replace the [^\s] with [^\n] if the tablename is always on a new line.

(?<TableName>[^\s]+)\s\{\n?\s*Type\:\s*(?<Type>\w+);\n?\s*Items\:\s*\[\n*(?<Items>(?:\n?[\s]*[0-9=]+)+)[\n\s]*\];\n}
Søren Ullidtz
  • 1,504
  • 1
  • 15
  • 26
  • Hi Soren, This works perfectly. But this pattern identifies only one item. IN my file, I have more than one like this. Can you pls give me the RegEx to identify more than one. I tried adding "*" at the end, but that doesn't help. – Uma Ilango Jul 27 '14 at 12:20
  • 1
    If you need several captures it'll depend on the language you're using (Guessing C# from your flag). In javascript you would use the "g" flag for this, in C# you would use Regex.Matches or call NextMatch on the result. – Søren Ullidtz Jul 28 '14 at 10:48
  • Damm.. Such a silly thinking mistake.. Thanks for the tip. Yes. I am using c#. I also found that my last array has A-Za-z & some special characters. So, added them to the Regex string. I have one last request. How can I get also the TABLENAME. When I add \w+\s* in the beginning, it results in "Too many )'s" error. – Uma Ilango Jul 29 '14 at 12:09
  • Added a new regex which should grab the tablename as well. (See the comment in the answer for details.) – Søren Ullidtz Jul 29 '14 at 15:29