0

My apologies if this has already been asked in a Ruby setting--I checked before posting but to be perfectly honest it has been a very long day and If I am missing the obvious, I apologize in advance!

I have the following string which contains a list of software packages installed on a system and for some reason I am having the hardest time parsing it. I know there has got to be a straight forward means of doing this in Ruby but I keep coming up short.

I would like to parse the below multi-line, tab-delimited, string into an array of arrays where I can then loop through each array element with an each_with_index and spit out the HTML code into my Rails app.

str = 'Product and/or Software Full Name 5242     [version 6.5.24]     [Installed on: 12/31/2015]

 Product and/or Software Full Name 5426     [version 22.4]     [Installed on: 06/11/2013]

 Product and/or Software Full Name 2451     [version 1.63]     [Installed on: 12/17/2015]

 Product and/or Software Full Name 5225     [version 43.22.51]     [Installed on: 11/15/2011]

 Product and/or Software Full Name 2420     [version 43.51-r2]     [Installed on: 12/31/2015]'

The end result would be an array of arrays with 5 elements like so:

[["Product and/or Software Full Name 5245"],["version 6.5.24"], ["Installed on: 12/31/2015"],["Product and/or Software Full Name 5426"],["version 22.4"],["Installed on: 06/11/2013"],["Product and/or Software Full Name 2451"],["version 1.63"],["Installed on: 12/17/2015"]]

Please Note: Only 3 of 5 arrays are shown for brevity

I would prefer to strip out the brackets from both 'version' and 'Installed on' but I can do that with gsub separately if that cannot easily be baked into an answer.

Last thing is that there won't always be an 'Installed on' entry for every line in the multiline string, so the answer will need to take that into account as applicable.

Kurt W
  • 321
  • 2
  • 15

1 Answers1

1

This ought to do:

expr = /(.+?)\s+\[([^\]]+)\](?:\s+\[([^\]]+)\])?/
str.scan(expr)

The expression is actually a lot less complex than it looks. It looks complex because we're matching square brackets, which have to be escaped, and also using character classes, which are enclosed in square brackets in the regular expression language. All together it adds a lot of noise.

Here it is split up:

expr = /
  (.+?)  # Capture #1: Any characters (non-greedy)

  \s+    # Whitespace
  \[     # Literal '['
    (      # Capture #2:
      [^\]]+   # One or more characters that aren't ']'
    )
  \]     # Literal ']'

  (?:    # Non-capturing group
    \s+    # Whitespace
    \[     # Literal '['
      ([^\]]+) # Capture #3 (same as #2)
    \]     # Literal ']'
  )?     # Preceding group is optional
/x

As you can see, the third part is identical to the second part, except it's in a non-capture group followed by a ? to make it optional.

It's worth noting that this may fail if e.g. the product name contains square brackets. If that's a possibility, one potential solution is include the version and Installed text in the match, e.g.:

expr = /(.+?)\s+\[(version [^\]]+)\](?:\s+\[(Installed [^\]]+)\])?/

P.S. Here's a solution that uses String#split instead:

expr = /\]?\s+\[|\]$/
res = str.each_line.map {|ln| ln.strip.split(expr) }
        .reject {|arr| arr.empty? }

If you have brackets in your product names, a possible workaround here is to specify a minimum number of spaces between parts, e.g.:

expr = /\]?\s{3,}\[|\]$/

...which of course depends on product names never having more than three consecutive spaces.

Jordan Running
  • 102,619
  • 17
  • 182
  • 182
  • What on earth! I am deeply curious how this works. I'll check back for your answer later and thank you so much for following up! – Kurt W Mar 01 '16 at 19:15
  • @KurtW I've edited my answer to include an explanation and an alternative solution. – Jordan Running Mar 01 '16 at 22:29
  • Thanks Jordan for including both examples. Thankfully, I've been reading up on regex so this mostly all makes sense but I do prefer to avoid regex where possible in favor of loops, etc. I appreciate you adding the version that uses `String#split`. Does the `String#Split` alternative work if []'s are contained in the product name (for example)? I'll test this now with my data and report back soon. – Kurt W Mar 01 '16 at 23:53
  • You'll have the same problem with `String#split`. I've edited my answer to include possible workarounds for both `String#scan` and `String#split`. – Jordan Running Mar 02 '16 at 00:03
  • Jordan, your answer is perfect for my data, thanks so much! Can you explain what ~expr = /\]?\s+\[|\]$/~ is checking for explicitly? I would expect it to be looking for `/\[` to catch an opening bracket but it starts from the beginning looking for `/\]`. Want to be able to support this code so need to understand what it is doing. Thanks as always and I hope to not have to lean on your assistance so much in the future! – Kurt W Mar 02 '16 at 00:06
  • What you have to remember about the `split` expression is that it has to match the parts *between* the parts you want to keep. So the first place you want to split is `[`, the second is `] [`, and the last is `]\n` (to get rid of the trailing `]`). I don't really understand the problem you're describing, though. – Jordan Running Mar 02 '16 at 00:17
  • I'm all set Jordan, thank you for clarifying! Just marked the answer as Accepted. Appreciate the help *once again*. – Kurt W Mar 02 '16 at 00:19
  • Very good answer. Kurt, in case you didn't know, you can upvote or downvote any answers to your questions, as well as award the greenie. – Cary Swoveland Mar 03 '16 at 08:11