1

I want to use REGEX to parse my data into 3 columns

Film data:
Marvel Comics Presents (1988) #125
Spider-Man Legends Vol. II: Todd Mcfarlane Book I (Trade Paperback)
Spider-Man Legends Vol. II: Todd Mcfarlane Book I
Spider-Man Legends Vol. II: Todd Mcfarlane Book I (1998)
Marvel Comics Presents #125

Expected output: enter image description here

I can see how to group it, but can't seem to REGEX it: enter image description here

I built this expression: (.*)\((\d{4})\)(.*)

I want to essentially use the ? quantifier to say the following: (.*)\((\d{4})\)**?**(.*) sort of like saying this group may or may not be there?

Nevertheless, it's not working.

starball
  • 20,030
  • 7
  • 43
  • 238
  • How your data is presented? list? entire multiline string?, Series? – Ric Jan 20 '23 at 21:12
  • Series. Each of the five films has a seperate cell. Sorry for the misunderstanding! – Benjamin Stringer Jan 20 '23 at 21:20
  • Welcome! Can you please read about [the problems with images images of text](//meta.stackoverflow.com/a/285557/11107541) and then [edit] to convert your images of tables into markup tables? See [/editing-help#tables](/editing-help#tables) for how. You might find [tablesgenerator.com](//www.tablesgenerator.com/markdown_tables) useful. – starball Jan 20 '23 at 21:42

1 Answers1

2

You could use 2 capture groups, where the last 2 are optional:

^(.*?)(?:\((\d{4})\))?\s*(#\d+)?$

The pattern matches:

  • ^ Start of string
  • (.*?) Capture group 1
  • (?:\((\d{4})\))? Optional non capture group capturing 4 digits in group 2
  • \s* match optional whitespace chars
  • (#\d+)? Optional group 3, match # and 1+ digits
  • $ End of string

See a regex101 demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    Amazing! I haven't started working with optional capture groups yet. Thanks for this – Benjamin Stringer Jan 20 '23 at 21:19
  • I just tried the following: ^(.+?)(\(\d+\))?\s*(#.*)?$ and still found success. So the 'non capture group' isn't essential? We can just make a regular group optional? https://regex101.com/r/BS5fnN/1 – Benjamin Stringer Jan 20 '23 at 22:30
  • @BenjaminStringer That also works, but in that case you also capture the parenthesis. – The fourth bird Jan 20 '23 at 22:53
  • ah! Great point. Thanks for your help. I've got a similar problem like this what I'm currently trying to solve. I have the feeling I need to use the optional non capture groups on this one? If you've got 5 minutes can you take a peek: https://stackoverflow.com/questions/75190041/regex-getting-the-follow-data-into-groups – Benjamin Stringer Jan 20 '23 at 22:59
  • 1
    @BenjaminStringer You can use a single pattern with an alternation, and then check if the values of the capture group exist, as now you have 5 groups `^(.*?)(?:(\(\d{4}\))?\s*(#\d+)?|(#\d+)?\s*(\(\d{4}\))?)$` See https://regex101.com/r/o3SUJX/1 – The fourth bird Jan 21 '23 at 09:07
  • 1
    that's really smart using the pipe operator like that and then matching the groups from the right hand side with the $. Thanks for that insight! – Benjamin Stringer Jan 21 '23 at 12:45