1

I have the following text:

Action by Toni Kroos, Real Madrid. Rayo Vallecano 2, Real Madrid 0.

where the nouns are Toni Kroos, Real Madrid (team1) and Rayo Vallecano (team2).

I need a regular expression with a named capturing group that returns these results given the following variations:

Action by Toni Kroos, Real Madrid. Rayo Vallecano 2, Real Madrid 0.

Expected result: Rayo Vallecano

Action by Toni Kroos, Real Madrid. Real Madrid 0, Rayo Vallecano 2.

Expected result: Rayo Vallecano

My naive intention was to negate the backreference captured in team1 and use it on the second sentence. So when it is about to match Real Madrid or Rayo Vallecano, it would discard Real Madrid as is the same value as team1. So team2 would return Rayo Vallecano. No lucky so far with something like (it only works on the first example):

^Action by .*\, (?<team1>.*)\. (?!\1)(?<team2>.*)( \d+\,| \d+\.).

In plain English, my expectation is a regex to pick either the first noun or the second one on the second sentence (after the first .) so team2 would be either Real Madrid or Rayo Vallecano in the examples, and then discard the one that matches the named capturing group team1 (Real Madrid in the example). So it wouldn't matter the order of the noun in the second sentence.

I'm no expert with regular expressions, so I'm not sure that's possible to achieve with one unique pattern that fits both examples. Is it possible to get such expression? If so, I would appreciate the solution with an explanation of the pattern used. Thanks in advance.

EDIT: The language I'll be using is JavaScript

3 Answers3

1

You might write the pattern using \1 to refer to the first capture group and use the named group team1 and team2 only once.

^Action by [^,]*, (?<team1>[^.]+)[.,] (?:\1[^,]*, )?(?<team2>[^,]+) \d+[,.]

Explanation

  • ^ Start of string
  • Action by [^,]*, Match Action by followed by optional chars other than a comma and then match ,
  • (?<team1>[^.]+)[.,] Group team1 match 1+ chars other than . then then match either . or ,
  • (?:\1[^,]*, )? Optionally match what is matched by group 1 using a backreference followed by optional chars other than , followed by matching ,
  • (?<team2>[^,]+) Named group team2 match 1+ chars other than , and then match a space
  • \d+[,.] Match 1+ digits followed by , or .

See a regex101 demo.

const regex = /^Action by [^,]*, (?<team1>[^.]+)[.,] (?:\1[^,]*, )?(?<team2>[^,]+) \d+[,.]/;
[
  `Action by Toni Kroos, Real Madrid. Rayo Vallecano 2, Real Madrid 0.`,
  `Action by Toni Kroos, Real Madrid. Real Madrid 0, Rayo Vallecano 2.`
].forEach(s => {
  const m = s.match(regex);
  if (m) {
    console.log(m.groups);
  }
});
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Hi, thanks. That exactly what I meant with my poor explanation. However, your proposal yields `Madrid` in the first example, and `Vallecano` in the second one. It should return the value `Rayo Vallecano` in both examples – David Miguel Yusta Feb 05 '23 at 12:13
  • @DavidMiguelYusta Yes I see, let me update it... – The fourth bird Feb 05 '23 at 12:14
  • Mister, you nailed it. Many many thanks. Could I ask for one last thing? If you could add a brief explanation for future readers (and myself) with the same problem, it would be awesome. – David Miguel Yusta Feb 05 '23 at 12:37
  • @DavidMiguelYusta I have slightly updated the pattern and added an explanation to it. – The fourth bird Feb 05 '23 at 13:23
0

You can use the following regular expression with named capturing groups:

Action by.*, (?<team1>.*)\. (?<team2>.*) (\d+), (?<team1>.*) (\d+)\.

This regex matches the text and captures the values of team1 and team2 using named capturing groups. Note that the named capturing group team1 is used twice in the expression to capture both values of the same team.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
oria kali
  • 16
  • 2
  • Hi @oria kali. Thanks for your answer. Unfortunately, I get an error when using the same named capturing group (`team1`) more than once (according to regex101.com). Also, wouldn't your proposal only work with the first example and not on the second? – David Miguel Yusta Feb 05 '23 at 09:20
  • The formatting errors in the original answer obscured what was being proposed. I tried to fix them, but since your question was rather unclear, probably don't get your hopes too high. – tripleee Feb 05 '23 at 10:14
  • Hi @tripleee. Many thanks. `team2` should be the value of the noun that doesn't match `team1`. In the second sentence after the first `.` will always have 2 team: one that matches `team1` and the other (the expected result). The problem is that in the second sentence, the order may vary. In the first example `Rayo Vallecano` is first, and on the other example is second. Does it make it more clear? – David Miguel Yusta Feb 05 '23 at 10:27
  • Probably edit your question rather than hide details in a comment to an answer. You want to make sure you include all your requirements, and enough test examples to illustrate the corner cases. – tripleee Feb 05 '23 at 11:23
0

Take a look at this: (https://regex101.com/r/yYgl5R/1)

Regex: ^Action by .*\, (.*)\. (?<team1>.*)( \d+\,) (?<team2>.*)(\d+\.)

This matches only the teams, and the score

Luuk
  • 12,245
  • 5
  • 22
  • 33