1

I am trying to write a regex to extract the blocks of information below, as well as the fields within each block. I am using Powershell.

I want to capture all "Server Item" blocks, and the following information for each one:

Server Item (1 **or more** of these items in the text)

   Identity (1 **or more** of these Identity items per Server Item)

    -- Allow (Each Identity contains **one** Allow)

    -- Deny (Each Identity contains **one** Deny)

    -- Allow (Inherited) (Each Identity contains **one** Allow (Inherited))

    -- Deny (Inherited) (Each Identity contains **one** Deny (Inherited))

The information is hierarchical (one to many for each heading to its children), as you can see.

Any answers greatly appreciated!

Sample Input text, below:

Server item: $/The/Path/Goes/Here
   Identity: Identity Number One (TYPE A)
      Allow:      
      Deny:
      Allow (Inherited): Read, Write, Checkin, Label
                         Lock, CheckinOther
      Deny (Inherited):  
====================================================================

Server item: $/The/Other/Path/Goes/Here
   Identity: Identity Number One (TYPE B)
      Allow: Read, Write, Checkin, Label
                         Lock, CheckinOther     
      Deny:
      Allow (Inherited): 
      Deny (Inherited):  
====================================================================

etc.

I have tried something like the following:

$thePattern = @"
(?<serveritem>Server item:(.|\n)*?=)
"@
$myText -match $thePattern

This does not capture all of the items and just gives me the first one! Also, how do I capture the Identity and field information for each Server item --> Identities --> Permissions?

The desired output would be to capture all of the Server items, and to be able to access each of the Identities, and for each Identity, to be able to access the permissions (Allow, Deny etc) The objective is to iterate through the blocks so as to add the information to a database for querying.

I am working on this with the following modification.

  • this includes named capture groups.
  • also note the use of (?s) to set the single-line option.
  • as powershell/.net do not support the global option, I have used [Regex]::Matches to match all.

    (?s)Server item:(?<serveritem>.*?)[\r\n]+ *Identity:(?<identity>.*?)[\r\n]+ *Allow: ?(?<allow>.*?)[\r\n]+ *Deny: ?(?<deny>.*?)[\r\n]+ *Allow \(Inherited\): ?(?<allowinherited>.*?)[\r\n]+ *Deny \(Inherited\): ?(?<denyinherited>.*?)([\r\n]+=|$)
    
TylerH
  • 20,799
  • 66
  • 75
  • 101
Banoona
  • 1,470
  • 3
  • 18
  • 32
  • What is your desired output? Do you need a regex to apply to each line or one regex to apply to all lines at the same time? – Chrᴉz remembers Monica Apr 18 '18 at 13:40
  • I am looking to get a regex to apply to all of the lines at the same time. I would look to iterating through the matches, and the capture groups within each match. The "outer" match would match the server items. Can this be done? Recommended alternatives? – Banoona Apr 18 '18 at 13:50

2 Answers2

1
Server item:(.*?)[\r\n]+ *Identity:(.*?)[\r\n]+ *Allow: ?(.*?)[\r\n]+ *Deny: ?(.*?)[\r\n]+ *Allow \(Inherited\): ?(.*?)[\r\n]+ *Deny \(Inherited\): ?(.*?)([\r\n]+=|$)

with options /gs (global+singleline)

Matches on

Server item: $/The/Path/Goes/Here
   Identity: Identity Number One (TYPE A)
      Allow:      
      Deny:
      Allow (Inherited): Read, Write, Checkin, Label
                         Lock, CheckinOther
      Deny (Inherited):  
====================================================================

Server item: $/The/Other/Path/Goes/Here
   Identity: Identity Number One (TYPE B)
      Allow: Read, Write, Checkin, Label
                         Lock, CheckinOther     
      Deny:
      Allow (Inherited): 
      Deny (Inherited):  

Match1

  • Group 1: $/The/Path/Goes/Here
  • Group 2: Identity Number One (Type a)
  • Group 3: [empty]
  • Group 4: [empty]
  • Group 5: Read, Write, Checkin, Label [NEWLINE + SPACES] Lock, CheckinOther
  • Group 6: [empty]

Match2

  • Group 1: $/The/Other/Path/Goes/Here
  • Group 2: Identity Number One (Type b)
  • Group 3: Read, Write, Checkin, Label [NEWLINE + SPACES] Lock, CheckinOther
  • Group 4: [empty]
  • Group 5: [empty]
  • Group 6: [empty]

Tested with regex101

Chrᴉz remembers Monica
  • 1,829
  • 1
  • 10
  • 24
  • Very good, thanks. Your answer seems not to cater for the **Server item** level. – Banoona Apr 18 '18 at 14:20
  • @Banoona I added the server item. I tought it would be a caption for the other items as you had a dot in front of "Identity" in your question – Chrᴉz remembers Monica Apr 18 '18 at 14:24
  • Thanks Chriz. It's a very good start for me, and I could take this and work it to the end. I have modified your solution a little, to give me named capture groups: (?s)Server item:(.*?)[\r\n]+ *Identity:(?.*?)[\r\n]+ *Allow: ?(?.*?)[\r\n]+ *Deny: ?(?.*?)[\r\n]+ *Allow \(Inherited\): ?(?.*?)[\r\n]+ *Deny \(Inherited\): ?(?.*?)([\r\n]+=|$) – Banoona Apr 18 '18 at 15:36
  • However, I am not quite getting the hierarchy that I need i.e. One Server Item with MANY identities and each of these having MANY Permission sets (Allow, Deny, Allow (Inherited) and Deny (Inherited). I am using powershell (.Net) so will use something like ([Regex]$theMatchPatternAbove).Matches($theRawContent)[1].Groups["identity"] --> and would look to capture subgroups. Can this be done? If not, the answer suits, as I can just iterate over the outer match (Server items) and rematch using another expression for each of the children (Identities and Permissions) – Banoona Apr 18 '18 at 15:39
  • @Banoona This gets really complicated. The regex is now already really long and I dont know how multiple Identity-items are chained. I'd recommend a approach like in kuujinbos solution to avoid complexity. Noone can read the regex and understand what it does right now. – Chrᴉz remembers Monica Apr 19 '18 at 10:37
  • I agree! That's why @kuujinbo has been marked as correct :) Thanks so much for your answer. – Banoona Apr 19 '18 at 15:47
1

Assuming the (text) input is as consistently formatted as your sample, you can extract the information you need with much simpler regular expressions if you break up the input and iterate in a line-by-line fashion.

For example, given the following input with "1 or more of these Identity items per Server Item":

Server item: $/The/Path/Goes/Here
   Identity: Identity Number One (TYPE A)
      Allow:      
      Deny:
      Allow (Inherited): Read, Write, Checkin, Label
                         Lock, CheckinOther
      Deny (Inherited):  
====================================================================

Server item: $/The/Other/Path/Goes/Here
   Identity: Identity Number One (TYPE B)
      Allow: Read, Write, Checkin, Label
                         Lock, CheckinOther     
      Deny:
      Allow (Inherited): 
      Deny (Inherited):  
====================================================================

Server item: $/The/Other/other/Path/Goes/Here
   Identity: Identity Number One (TYPE C)
      Allow: Read, Write, Checkin, Label
                         Lock, CheckinOther     
      Deny:
      Allow (Inherited): 
      Deny (Inherited):  
   Identity: Identity Number One (TYPE D)
      Allow: Read, Write, Checkin, Label
                         Lock, CheckinOther     
      Deny:
      Allow (Inherited): 
      Deny (Inherited): 

To get the hierarchical information:

# used .txt file for example 
$lines = Get-Content $path;
$result = @{};
$serverItem = '';
$identityItem = '';
$currentKey = '';
foreach ($line in $lines) {
    $key, $value = [Regex]::Split($line.Trim(), '\s*:\s*', 2);
    switch -Regex ($key) {
        '^server item' { 
            $serverItem = $value;
            $result.$serverItem = @{};
            continue;
        }
        '^identity' { 
            $identityItem = $value;
            $result.$serverItem.$identityItem = @{};
            continue;
        }
        '^[A-Za-z]+' {
            if ($value -ne $null) {
                $currentKey = $key;
                $result.$serverItem.$identityItem.$key = $value;
            } else {
                $result.$serverItem.$identityItem.$currentKey += ", $key";
            }
        }
    }
}
kuujinbo
  • 9,272
  • 3
  • 44
  • 57