1

I have a string containing multiple products orders which have been joined together without a delimiter.

I need to parse the input string and convert sets of three substrings into separate rows of data.

I tried splitting the string using split() and strstr() function, but could not generate the desired result.

How can I convert this statement into different columns?

RM is Malaysian Ringgit

From this statement:

"2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.6"

Into seperate row:

  • 2 x Brew Coffeee Panas: RM7.4
  • 2 x Tongkat Ali Ais: RM8.6

And this 2 row into this table in DB:

Table: Products

Product Name Quantity Total Amount (RM)
Brew Coffeee Panas 2 7.4
Tongkat Ali Ais 2 8.6

*Note: the "total amount" substrings will reliably have a numeric value with precision to one decimal place.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
Afif Zuhair
  • 53
  • 1
  • 10
  • 4
    How can you tell where to split? What if it was `... RM7.125 x ...`, is that `RM7.1` and `25 x` or `RM7.12` and `5 x`? – Nick Apr 14 '21 at 07:12
  • can you control the format used to generate this statement or is that fixed by a third party? – Professor Abronsius Apr 14 '21 at 07:14
  • @Nick yes it only take one decimal place for the amount, if its RM7.125, it should be RM7.1 and 25 x – Afif Zuhair Apr 14 '21 at 07:15
  • @ProfessorAbronsius the data was given to me generated by third party system, and my task is to seperate the strings. Initially, I expected this row to be seperated by comma but it doesn't. – Afif Zuhair Apr 14 '21 at 07:18

4 Answers4

3

You could use regex if your string format is consistent. Here's an expression that could do that:

(\d) x (.+?): RM(\d+\.\d)

Basic usage

$re = '/(\d) x (.+?): RM(\d+\.\d)/';
$str = '2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.6';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_export($matches);

Which gives

array (
  0 => 
  array (
    0 => '2 x Brew Coffeee Panas: RM7.4',
    1 => '2',
    2 => 'Brew Coffeee Panas',
    3 => '7.4',
  ),
  1 => 
  array (
    0 => '2 x Tongkat Ali Ais: RM8.6',
    1 => '2',
    2 => 'Tongkat Ali Ais',
    3 => '8.6',
  ),
)

Group 0 will always be the full match, after that the groups will be quantity, product and price.

Try it online

Lucan
  • 2,907
  • 2
  • 16
  • 30
  • @l how if the price I got is inconsistent? Let say second product is RM19.5, but the first product comes with price RM31 (without decimal point). I tried run by using "(\d+)\sx\s([a-zA-Z\d\s]+):\sRM(\d+?.\d{1})" for this statement "2 x B026 Kopi Hainan Kecil: RM312 x B006 Kopi Hainan Besar: RM19.5" the pattern can only read the second product and as you can see there are no decimal point in the price for the first product..inconsistent is hard.. – Afif Zuhair Apr 17 '21 at 02:33
  • @AfifZuhair If your requirements have changed, you should select an answer here and then ask another question with your new requirements and link back to this question. – Lucan Apr 18 '21 at 02:52
1
  • Capture one or more digits
  • Match the space, x, space
  • Capture one or more non-colon characters until the first occuring colon
  • Match the colon, space, then RM
  • Capture the float value that has a max decimal length of 1
    OP says in comment under question: it only take one decimal place for the amount

There are no "lazy quantifiers" in my pattern, so the regex can move most swiftly.

This regex pattern is as Accurate as the sample data and requirement explanation allows, as Efficient as it can be because it only contains greedy quantifiers, as Concise as it can be thanks to the negated character class, and as Readable as the pattern can be made because there are no superfluous characters.

Code: (Demo)

var_export(
    preg_match_all('~(\d+) x ([^:]+): RM(\d+\.\d)~', $string, $m)
        ? array_slice($m, 1)  // omit the fullstring matches
        : []                  // if there are no matches
);

Output:

array (
  0 => 
  array (
    0 => '2',
    1 => '2',
  ),
  1 => 
  array (
    0 => 'Brew Coffeee Panas',
    1 => 'Tongkat Ali Ais',
  ),
  2 => 
  array (
    0 => '7.4',
    1 => '8.6',
  ),
)

You can add the PREG_SET_ORDER argument to the preg_match_all() call to aid in iterating the matches as rows.

preg_match_all('~(\d+) x ([^:]+): RM(\d+\.\d)~', $string, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
    echo '<tr><td>' . implode('</td><td>', array_slice($match, 1)) . '</td></tr>';
}
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • 2
    I found that the flag `PREG_SET_ORDER` will group the results nicely. Might be a good idea to add that to your answer. – Ivan86 Apr 14 '21 at 09:43
0

The first thing I would do would be to perform a simple replacement using preg_replace to insert, with the aid of a a back-reference to the captured item, based upon the known format of a single decimal point. Anything beyond that single decimal point forms part of the next item - the quantity in this case.

$str="2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.625 x Koala Kebabs: RM15.23 x Fried Squirrel Fritters: RM32.4";

#   qty price
#   2   7.4
#   2   8.6
#   25  15.2
#   3   32.4



/*
    Our RegEx to find the decimal precision, 
    to split the string apart and the quantity
*/
$pttns=(object)array(
    'repchar'   =>  '@(RM\d{1,}\.\d{1})@',
    'splitter'  =>  '@(\|)@',
    'combo' =>  '@^((\d{1,}) x)(.*): RM(\d{1,}\.\d{1})$@'
);
# create a new version of the string with our specified delimiter - the PIPE
$str = preg_replace( $pttns->repchar, '$1|', $str );

# split the string intp pieces - discard empty items
$a=array_filter( preg_split( $pttns->splitter, $str, null ) );

#iterate through matches - find the quantity,item & price
foreach($a as $str){
    preg_match($pttns->combo,$str,$matches);
    $qty=$matches[2];
    $item=$matches[3];
    $price=$matches[4];
    
    printf('%s %d %d<br />',$item,$qty,$price);
}

Which yields:

Brew Coffeee Panas 2 7
Tongkat Ali Ais 2 8
Koala Kebabs 25 15
Fried Squirrel Fritters 3 32
Professor Abronsius
  • 33,063
  • 5
  • 32
  • 46
0

You can use a regex like this:

/(\d+)\sx\s([^:]+):\sRM(\d+\.?\d?)(?=\d|$)/

Explanation:

  • (\d+) captures one or more digits
  • \s matches a whitespace character
  • ([^:]+): captures one or more non : characters that come before a : character (you can also use something like [a-zA-Z0-9\s]+): if you know exactly which characters can exist before the : character - in this case lower case and upper case letters, digits 0 through 9 and whitespace characters)
  • (\d+\.?\d?) captures one or more digits, followed by a . and another digit if they exist
  • (?=\d|$) is a positive lookahead which matches a digit after the main expression without including it in the result, or the end of the string

You can also add the PREG_SET_ORDER flag to preg_match_all() to group the results:

PREG_SET_ORDER

Orders results so that $matches[0] is an array of first set of matches, $matches[1] is an array of second set of matches, and so on.

Code example:

<?php
    $txt = "2 x Brew Coffeee Panas: RM7.42 x Tongkat Ali Ais: RM8.62 x B026 Kopi Hainan Kecil: RM312 x B006 Kopi Hainan Besar: RM19.5";
    $pattern = "/(\d+)\sx\s([^:]+):\sRM(\d+\.?\d?)(?=\d|$)/";

    if(preg_match_all($pattern, $txt, $matches, PREG_SET_ORDER)) {
       print_r($matches);
    }
?>

Output:

Array
(
    [0] => Array
        (
            [0] => 2 x Brew Coffeee Panas: RM7.4
            [1] => 2
            [2] => Brew Coffeee Panas
            [3] => 7.4
        )
    [1] => Array
        (
            [0] => 2 x Tongkat Ali Ais: RM8.6
            [1] => 2
            [2] => Tongkat Ali Ais
            [3] => 8.6
        )
    [2] => Array
        (
            [0] => 2 x B026 Kopi Hainan Kecil: RM31
            [1] => 2
            [2] => B026 Kopi Hainan Kecil
            [3] => 31
        )
    [3] => Array
        (
            [0] => 2 x B006 Kopi Hainan Besar: RM19.5
            [1] => 2
            [2] => B006 Kopi Hainan Besar
            [3] => 19.5
        )
)

See it live here php live editor and here regex tester.

Ivan86
  • 5,695
  • 2
  • 14
  • 30
  • it doesn't work when I test with 11 quantity for the second items for example in "2 x B026 Kopi Hainan Kecil: RM3911 x B006 Kopi Hainan Besar: RM19.5". the first item will catch it's price as RM391. – Afif Zuhair Apr 19 '21 at 03:05
  • @AfifZuhair How would you be able to determine if it's supposed to be RM391, RM39 or RM3? I already downvoted your question and if I could I'd downvote it again. All of these requirements should have been mentioned in your question. – Ivan86 Apr 19 '21 at 03:24
  • You're right. Thanks for the clarifications. I keep receiving the unformatted/standard from the third party apps which was not their fault. Anyway, this answer thread helps me so much, and I use your answer above answer as my finalised solution. =) – Afif Zuhair Apr 19 '21 at 06:04
  • @AfifZuhair That's alright, glad you learned something from this thread. In the end that's what it's all about. For future questions, please try to nail down all the constraints and make them a part of your question. It will benefit you and all the people that are trying to help you. BTW, accept an answer on the page that you consider answers your question, so the question can be closed. You can do it like [this](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work). – Ivan86 Apr 19 '21 at 16:19