Split line into multiple parts using regex

Question

I have a string like

BK0001 My book (4th Edition) $49.95 (Clearance Price!)

I would like a way to split it into different parts like

[BK0001] 
[My Book (4th Edition)] 
[$49.95] 
[(Clearance Price!)]

I'm pretty new at regex and I'm using this to parse a line on a file. I managed to get the first part BK0001 by using

$parts = preg_split('/\s+/', 'BK0001 My book (4th Edition) $49.95 (Clearance Price!)';

then getting the $part[0] value but not sure on how to split it to get the other values.

have you used regex101 yet? Great resource for both learning regexes and developing for a particular need . — erik258, Nov 07 '18 at 20:42
Try spelling out the subpatterns. Say, `preg_match('~^(?\S+)\s+(?.*?)\s+(\$\d[\d.]*)\s*(?.*)$~', $text, $matches)`, see [demo](https://regex101.com/r/EF0I6W/1). — Wiktor Stribiżew, Nov 07 '18 at 20:43
@Dan Farrel I have but I don't use php and regex often, I code mostly in python and usually use string.split() for tasks such as these. This is one of those rare moments when I need regex and investing time learning it fully really a good option right now. — answerSeeker, Nov 07 '18 at 20:48
`learning it fully really a good option right now` it's always good to learn Regex, most languages have some flavor of it and it's incredibly powerful and useful. — ArtisticPhoenix, Nov 07 '18 at 21:07

Wiktor Stribiżew · Accepted Answer · 2018-11-07T20:54:57.210

You may match the specific parts of the input string using a single pattern with capturing groups:

preg_match('~^(?<code>\S+)\s+(?<name>.*?)\s+(?<num>\$\d[\d.]*)\s*(?<details>.*)$~', $text, $matches)

See the regex demo. Actually, the last $ is not required, it is there just to show the whole string is matched.

Details

^ - start of a string
(?<code>\S+) - Group "code": one or more non-whitespace chars
\s+ - 1+ whitespaces
(?<name>.*?) - Group "name": any 0+ chars other than line break chars, as few as possible
\s+ - 1+ whitespaces
(?<num>\$\d[\d.]*) - Group "num": a $, then 1 digit and then 0+ digits or .
\s* - 0+ whitespaces
(?<details>.*) - Group "details": any 0+ chars other than line break chars, as many as possible
$ - end of string.

PHP code:

$re = '~^(?<code>\S+)\s+(?<name>.*?)\s+(?<num>\$\d[\d.]*)\s*(?<details>.*)$~';
$str = 'BK0001 My book (4th Edition) $49.95 (Clearance Price!)';
if (preg_match($re, $str, $m)) {
    echo "Code: " . $m["code"] . "\nName: " . $m["name"] . "\nPrice: " .
         $m["num"] . "\nDetails: " . $m["details"]; 
}

Output:

Code: BK0001
Name: My book (4th Edition)
Price: $49.95
Details: (Clearance Price!)

Ravi Rajendra · Answer 2 · 2018-11-07T21:01:49.870

Try using preg_match

$book_text = "BK0001 My book (4th Edition) $49.95 (Clearance Price!)";
if(preg_match("/([\w\d]+)\s+(.*?)\s+\\((.*?)\\)\s+(\\$[\d\.]+)\s+\\((.*?)\\)$/",$book_text,$matches)) {
    //Write code here
    print_r($matches);
}

$matches[0] is reserved for the full match string. You can find the split parts from $matches[1]...

Array ( [0] => BK0001 My book (4th Edition) $49.95 (Clearance Price!) [1] => BK0001 [2] => My book [3] => 4th Edition [4] => $49.95 [5] => Clearance Price! )

$matches[1] is "book number"
$matches[2] is "book name"
$matches[3] is "edition"
$matches[4] is "price"
$matches[5] is "special text"

Split line into multiple parts using regex

2 Answers2