2

I have the following strings, that comes with the format country: Cum name, Extra info.

Asia: Asian Cup - Play Offs
Asia: Asian Cup
Asia: World Cup - Qualification - First Stage
Australia: A-League
Belgium: Jupiler League - Championship Group
Brazil: Série A

The problem I have is how to seperate the information per line using regex.

More specific, from the first line I like to export the following information:

[ Asia, Asian Cup, Play Offs ]

From the second the information

[ Asia, Asian Cup ]

and so on.

For the moment I have try the following statement:

^([\w]+\:\s+)[^\-]+(?!\-\s)+

It is not completed and I don't know how to continue with this. My primary issue is that I don't know how to negate a part of the statement.

So, what is the way to solve this issue ?

Abdul Aziz Barkat
  • 19,475
  • 3
  • 20
  • 33
KodeFor.Me
  • 13,069
  • 27
  • 98
  • 166

4 Answers4

3

You can use explode:

$lines = ['Asia: Asian Cup - Play Offs',
          'Asia: Asian Cup',
          'Asia: World Cup - Qualification - First Stage',
          'Australia: A-League',
          'Belgium: Jupiler League - Championship Group',
          'Brazil: Série A'];

$results = array_map(function ($i) {
    $ret = [];
    list($ret[0], $tmp) = explode(': ',$i, 2);
    return array_merge($ret, explode(' - ', $tmp, 2));
}, $lines);

print_r($results);
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
2

You can try this regex in PHP:

/^(\p{Lu}\p{L}*):\h*(.+?)(?:\h-\h(.+))?$/mu

RegEx Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Do you know how this can be modified for allowing spaces in the first part ? Like `Some Word: Another Word - Extra Word` – KodeFor.Me Feb 06 '16 at 12:16
  • 1
    You can use [`^(\p{Lu}[ \p{L}]*):\h*(.+?)(?:\h-\h(.+))?$`](https://regex101.com/r/dG8nN0/2) – anubhava Feb 06 '16 at 13:14
1

First, explose your string on \n, then you can use the following regex :

([\w\s]+): ([\w ]+)(?:- ?([\w -]+))?

explained here https://regex101.com/r/lV7lT0/1

Gwendal
  • 1,273
  • 7
  • 17
0

This looks not terribly complicated to me. You want to split on a colon followed by a space or a space-hyphen-space. preg_split() will be happy to oblige.

Code: (Demo)

$lines = [
    'Asia: Asian Cup - Play Offs',
    'Asia: Asian Cup',
    'Asia: World Cup - Qualification - First Stage',
    'Australia: A-League',
    'Belgium: Jupiler League - Championship Group',
    'Brazil: Série A'
];

var_export(
    array_map(
        fn($v) => preg_split('/: | - /', $v),
        $lines
    )
);

Output:

array (
  0 => 
  array (
    0 => 'Asia',
    1 => 'Asian Cup',
    2 => 'Play Offs',
  ),
  1 => 
  array (
    0 => 'Asia',
    1 => 'Asian Cup',
  ),
  2 => 
  array (
    0 => 'Asia',
    1 => 'World Cup',
    2 => 'Qualification',
    3 => 'First Stage',
  ),
  3 => 
  array (
    0 => 'Australia',
    1 => 'A-League',
  ),
  4 => 
  array (
    0 => 'Belgium',
    1 => 'Jupiler League',
    2 => 'Championship Group',
  ),
  5 => 
  array (
    0 => 'Brazil',
    1 => 'Série A',
  ),
)

If your input data is actually a block of text, then just use \R to split it on the newlines. (Demo)

preg_split('/\R/', $text)
mickmackusa
  • 43,625
  • 12
  • 83
  • 136