-1

I'm very new to this and I've googled about regex and RE2, but little help is found. Seems to be rather niche topic and the material isn't quite comprehensive. Would appreciate any help in this. Thank you!

I have a list with over 30k rows of text and I would like to extract those that match this pattern:

`=whatever`+whatever-whatever`+whatever/whatever.whatever

The data looks like this:

Detail 3
Detail 4
Detail 3
Detail 4
blah
blah
Detail 5
Detail 4
Detail 5
`=P2385`+P2385-M01 `+MC2385/5.0
`=BS2366`+P2366-A1 `+FIELD/14.5
`=P2385`+P2385-M02 `+MC2385/5.4
blah
blah
`=P2385`+P2385-M03 `+MC2385/5.5
`=P2385`+P2385-M04 `+MC2385/5.9
`=P2385`+P2385-M05 `+MC2385/6.0
`=BS2366`+P2366-A1 `+FIELD/14.5
`=P2385`+P2385-M06 `+MC2385/6.4
`=P2385`+P2385-M07 `+MC2385/6.5
`=P2385`+P2385-M08 `+MC2385/6.9
blah
blah
`=P2385`+P2385-M09 `+MC2385/7.0
`=P2385`+P2385-M10 `+MC2385/7.4
`=P2385`+P2385-M11 `+MC2385/7.5
`=P2385`+P2385-M12 `+MC2385/7.9
`=P2381`+P2381-B31 `+MC2381/12.5
blah
blah
`=P2381`+P2381-B32 `+MC2381/12.9
`=P2381`+P2381-B33 `+MC2381/13.0
`=P2370`+P2370-M01
blah
blah
`+MC2370/6.0
`=P2370`+P2370-M02 `+MC2370/6.4
`=P2370`+P2370-M03 `+MC2370/6.5
`=P2370`+P2370-M04 `+MC2370/6.9
`=BS2366`+P2366-A1 `+FIELD/14.5
`=P2368`+P2368-M01 `+MC2370/11.0
`=P2368`+P2368-M05 `+MC2370/12.0
`=BS2366`+P2366-A1 `+FIELD/14.5 `=P2366`+P2366-M01 `+MC2370/10.5
`=P2366`+P2366-M02 `+MC2370/10.9
Detail 3
Detail 4
Detail 3
blah
blah
Detail 4
Detail 5
Detail 5
blah
blah
Detail 4
Detail 5
blah
blah
blah
blah

The output from the regex extract should ideal look like this:

`=P2385`+P2385-M01 `+MC2385/5.0
`=P2385`+P2385-M02 `+MC2385/5.4
`=P2385`+P2385-M03 `+MC2385/5.5
`=P2385`+P2385-M04 `+MC2385/5.9
`=P2385`+P2385-M05 `+MC2385/6.0
`=BS2366`+P2366-A1 `+FIELD/14.5
`=P2385`+P2385-M06 `+MC2385/6.4
`=P2385`+P2385-M07 `+MC2385/6.5
`=P2385`+P2385-M08 `+MC2385/6.9
`=P2385`+P2385-M09 `+MC2385/7.0
`=P2385`+P2385-M10 `+MC2385/7.4
`=P2385`+P2385-M11 `+MC2385/7.5
`=BS2366`+P2366-A1 `+FIELD/14.5
`=P2385`+P2385-M12 `+MC2385/7.9
`=P2381`+P2381-B31 `+MC2381/12.5
`=P2381`+P2381-B32 `+MC2381/12.9
`=P2381`+P2381-B33 `+MC2381/13.0
`=P2370`+P2370-M02 `+MC2370/6.4
`=P2370`+P2370-M03 `+MC2370/6.5
`=P2370`+P2370-M04 `+MC2370/6.9
`=P2368`+P2368-M01 `+MC2370/11.0
`=P2368`+P2368-M05 `+MC2370/12.0
`=BS2366`+P2366-A1 `+FIELD/14.5
`=P2366`+P2366-M01 `+MC2370/10.5
`=P2366`+P2366-M02 `+MC2370/10.9
player0
  • 124,011
  • 12
  • 67
  • 124

2 Answers2

0

To match the specific data in your example, use this pattern:

^`=[A-Z0-9]+`\+[A-Z0-9]+-[A-Z0-9]+\s*`\+[A-Z0-9]+\/\d+\.\d+

Test here. That page also explains all the details of the regex.

Instead of the specific [A-Z0-9]+ you can use a generic .+?.


You do not specify where your list is stored.


If the list is stored in a text file, then you can use grep as the obvious tool for the job:

grep "^`=[A-Z0-9]+`\+[A-Z0-9]+-[A-Z0-9]+\s*`\+[A-Z0-9]+\/\d+\.\d+" input_file >output_file

where:

  • > is the output redirection operator
  • input_file is the name of the input file
  • output_file is the name of the file where you want to store the results.

If it is in a spreadsheet, then @player0 already provided a good answer.

virolino
  • 2,073
  • 5
  • 21
0
=ARRAYFORMULA(TRANSPOSE(SPLIT(SUBSTITUTE("♦"&TEXTJOIN("♦", 1, 
 IFERROR(REGEXEXTRACT(A1:A, "`.+\+.+-.+\+.+/.+"))), " `=", "♦`="), "♦")))

0

player0
  • 124,011
  • 12
  • 67
  • 124
  • Thanks! Would you mind elaborating on your method and syntax? – W.B. Sam Aug 23 '19 at 10:19
  • @W.B.Sam sure, `"'.+\+.+-.+\+.+/.+"` is basically your "`=whatever`+whatever-whatever`+whatever/whatever.whatever" where "whatever" is `.+`. then `IFERROR` eliminates all error rows and makes them blank. next, the `TEXTJOIN` joins all values with a unique symbol ♦. then `SUBSTITUTE` fixes all instances where you have more than one patterns per row (like your row 43 which has two items) and then `SPLIT` splits it on the unique symbol to get you your desired result – player0 Aug 23 '19 at 10:33