0

So I want to search for A,B,C,D in a string in any order, but if C doesn't exist I still want it to give me A,B, and D, etc.

To be more specific, here is the exact problem I'm trying to solve. CSV file with lines that look like this:

Name,(W)5555555,(H)5555555,(M)5555555,(P)5555555

However, the W,H,M,P could be in any order. Plus they don't all exist on every line. So it looks more like this:

Name,(W)5555555,(H)5555555,(M)5555555,(P)5555555
Name,(H)5555555,(P)5555555,(W)5555555,(M)5555555
Name,(M)5555555,(H)5555555,,
Name,(P)5555555,,,

What I need to accomplish is to put all items in the correct order so they line up under the correct columns. So the above should look like this when I'm done:

Name,(W)5555555,(H)5555555,(M)5555555,(P)5555555
Name,(W)5555555,(H)5555555,(M)5555555,(P)5555555
Name,,(H)5555555,(M)5555555,
Name,,,,(P)5555555

Edit: It appears I'm a bad Stack Overflow citizen. I didn't get answers fast enough for when my project needed to be done, and therefore forgot to come back and add a correct issues in my post. I ended up writing a python script to do this instead of just using find/replace in BBEdit or Sublime Text 2 like I was originally trying to do.

So I would like a method to do something like this that works in either BBEdit or Sublime Text. Or Vim for that matter. I'll try to keep a better eye on it this time, and I'll respond to the answers that already exist.

  • 2
    To quote the `regex` tag description: *Please also include a tag specifying the programming language or tool you are using.* – Jon Clements Aug 21 '13 at 18:06
  • 2
    It would be good if you could include, in your question, the language you're using and your attempt(s) thus far. – Jerry Aug 21 '13 at 18:07
  • Certainly doable. Conceptually, I think you'd need to treat each item as a separate query, then store the results in some structured object whether in a database, or an array/list/dictionary/etc. Then you'll need to re-write the CSV file. I am not certain that Regular Expressions is the best tool for this job, unless you're really looking for *patterns* instead of specific delimiters like `W`, `H`, `M`, etc. – David Zemens Aug 21 '13 at 18:09
  • If you answer Jon Clements's or Jerry's request, we might be able to help you with [lookarounds](http://www.regular-expressions.info/lookaround.html). – Martin Ender Aug 21 '13 at 18:12

2 Answers2

1

If your regex flavor supports lookarounds, this can be done with a simple regex-replace. Since lookaheads do not advance the position of the regex engine's cursor, we can use them to look for multiple patterns somewhere after one particular position. We can capture all these findings and write them back in the replacement string. To make sure that all of them are optional we could simply use ?, but in this case, I'll add an empty alternative to the lookahead - this is necessary to trick the engine when it's backtracking. The pattern could then look like this:

^Name,(?=.*([(]W[)]\d+)|)(?=.*([(]H[)]\d+)|)(?=.*([(]M[)]\d+)|)(?=.*([(]P[)]\d+)|).*

The .* at the end is to make sure that everything gets removed in the replacement.

And the replacement string like this:

Name,$1,$2,$3,$4

Here is a working demo using the ECMAScript flavor. It's a rather limited flavor, so this solution should be adaptable to most environments.

Martin Ender
  • 43,427
  • 11
  • 90
  • 130
  • This seems to find everything and give me the necessary groups. However, it doesn't allow me to put them in the correct order (which won't be the same for every line, because it depends on the order they're in for each line, which isn't consistent. Would it be possible to make these groups named groups instead, so that they can be positioned in a specific order? – Matthew Fitzsimmons Aug 19 '14 at 16:46
  • Actually, I am incorrect. This does appear to capture everything in the correct order. Kudos. Now I just need to figure out how to add my actual names to the front. The format is actually more like this: "Last, First",(W)5555555,(H)5555555,(M)5555555,(P)5555555 Singleword,(W)5555555,(H)5555555,(M)5555555,(P)5555555 Notice some of them have the quotation marks with a comma in the middle, some don't. – Matthew Fitzsimmons Aug 19 '14 at 16:56
  • The following addition to your regex appears to work in Sublime Text, although it doesn't work on regexr.com: `^(?(?=\".+\")(\".+\")|([^,]+)),(?=.*([(]W[)]\d+)|)(?=.*([(]H[)]\d+)|)(?=.*([(]M[)]\d+)|)(?=.*([(]P[)]\d+)|).*` The downside is that it treats the two different name options as $1 and $2, so with a replacement like `$1,$2,$3,$4,$5,$6`, names of the format `"Last, First"` will end up in the first field, but names of the format `singleword` will end up in the second field. This should be easy enough to correct with a second find/replace but I would appreciate help fixing it. – Matthew Fitzsimmons Aug 19 '14 at 17:31
0

Something like this?

^Name,(\((?:W|H|P|M)\)\d+(?:,)?)*[,]*$

Regular expression visualization

Edit live on Debuggex

Will give you all the matches per row. Then you simple need to allocate each match to the right column.

Community
  • 1
  • 1
dognose
  • 20,360
  • 9
  • 61
  • 107