2

I am trying to parse 1 line that is constructed in this format:

Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)

I have this working perfectly in C# using named capture groups, but this is PHP and strictly on topic. So I have no idea how to separate each field and build a associative array I can iterate.

I can retrieve the first item in double-quotes "textfile1.txt" using

$string = 'Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)';
preg_match("/(?:(?:\"(?:\\\\\"|[^\"])+\")|(?:'(?:\\\'|[^'])+'))/is", $string, $match);
print_r($match);
Array
(
    [0] => 'textfile1.txt'
)

I cant figure it out. I have tried different expressions to consider both the string/long fields but no luck.

Is there something I am missing?

End result is having each filename/size added to a array to access later.

Any help is appreciated

https://regex101.com/r/naSdng/1

My C# implementation looks like this:

MatchCollection result = Regex.Matches(file, @"(?:\G(?!\A)\s*,\s*|\w+\()(?:""(?<filename>.*?)""|'(?<filename>.*?)')\s*,\s*(?<filesize>\d+)");
matchCol = result;
foreach (Match match in result)
{
    ListViewItem ItemArray = new(new string[] {
        match.Groups["filename"].Value.Trim(), BytesToReadableString(Convert.ToInt64(match.Groups["filesize"].Value)), "Ready"
    });
    fileList.Items.Add(ItemArray);
}
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
Jonny
  • 73
  • 7
  • 4
    In php you need to use `preg_match_all`. What are your expected matches from this string? – anubhava Oct 20 '22 at 04:10
  • Oh no kidding. So its like string.Matches(...) as apposed to string.Match(...) (C#, same idea). That gives me a lot to go on thank you - Getting further ```foreach ($match[0] as $test)``` – Jonny Oct 20 '22 at 04:30
  • tell me what you want to do? – Thanh Khánh Oct 20 '22 at 05:14
  • I just want to parse every filename (inside double quotes) following the associated size and repeat for each occurrence across the string. I want this all added to a array. Currently, I only have each filename as of current changes. I need a regex pattern that matches each field in that string excluding ```Files(``` and the last ```)``` – Jonny Oct 20 '22 at 05:19
  • 1
    This is the complete solution, if it were not PHP https://pastebin.com/xzD79Tse – Jonny Oct 20 '22 at 05:25
  • I wish I would have seen this commented question detail before answering. Please always put all question details into the question's body. – mickmackusa Oct 20 '22 at 06:18

2 Answers2

3

Convert the input string into a valid json string and decode it to ensure that the numeric values are cast as integers. Chunk the flat array into pairs and assign each pair as an associative element in to the result array.

Code: (Demo)

var_export(
    array_reduce(
        array_chunk(
            json_decode('[' . substr($string, 6, -1) . ']'),
            2
        ),
        function ($result, $row) {
            $result[$row[0]] = $row[1];
            return $result;
        }
    )
);

or split the inner text on every second comma-space and parse the comma-separated strings with sscanf().

Code: (Demo)

var_export(
    array_reduce(
        preg_split('/[^,]+,[^,]+\K, /', substr($string, 6, -1)),
        function ($result, $string) {
            [$key, $result[$key]] = sscanf($string, '"%[^"]", %d');
            return $result;
        }
    )
);

or use preg_match_all() with the \G (continue metacharacter) then pair up the results in a foreach() so that you can explicitly cast the numbers as int-type values.

Code: (Demo)

$result = [];
preg_match_all('/(?:^\w+\(|\G, )"([^"]+)", (\d+)/', $string, $matches, PREG_SET_ORDER);
foreach ($matches as [1 => $key, 2 => $val]) {
    $result[$key] = (int) $val;
}
var_export($result);

or iterate over each individual value after exploding the content inside of the parentheses. Then toggle the usage of the given string to determine keys and values.

Code: (Demo)

$result = [];
foreach (explode(', ', substr($string, 6, -1)) as $val) {
    if (!isset($key)) {
        $key = trim($val, '"');
    } else {
        $result[$key] = (int) $val;
        unset($key);
    }
}
var_export($result);
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
3

The regex you have shown in C# can be easily adapted to work in PHP as well.

You may use:

(?:\w+\(\h*|(?<!\A)\G\h*,\h*)"(?<filename>[^"]+)"\h*,\h*(?<filesize>\d+)

Note that I have refactored your regex a bit to make it more efficient.

RegEx Demo

Code Demo

Code:

<?php
$s = 'Files("textfile1.txt", 7268474425, "textfile2.txt", 661204928, "textfile3.txt", 121034)';

if (preg_match_all('/(?:\w+\(\h*|(?<!\A)\G\h*,\h*)"(?<filename>[^"]+)"\h*,\h*(?<filesize>\d+)/', $s, $m)) {
   $out = array_combine ( $m['filename'], $m['filesize'] );
   print_r($out);
}
?>

Output:

Array
(
    [textfile1.txt] => 7268474425
    [textfile2.txt] => 661204928
    [textfile3.txt] => 121034
)

RegEx Details:

  • (?:: Start a non-capture group
    • \w+\(\h*: Match 1+ word characters followed by ( and 0 or more whitespaces
    • |: OR
    • (?<!\A)\G: Start matching from end of the previous match
    • \h*,\h*: Match comma surrounded with 0 or more whitespaces
  • ): End non-capture group
  • "(?<filename>[^"]+)": Match double quoted string with named capture group filename to match 1+ of any char that is not a "
  • \h*,\h*: Match comma surrounded with 0 or more whitespaces
  • (?<filesize>\d+): Named capture group filesize to match 1+ digits
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 3
    Wow this even more relevant and inline with our C# codebase. I appreciate that so much – Jonny Oct 20 '22 at 06:04
  • 1
    Related: [PHP : simple string to array](https://stackoverflow.com/a/7381590/2943403) and [put string in array, split by every second line](https://stackoverflow.com/a/53451850/2943403) and [In mixed string, make the chars key of an array and numbers it's values in PHP](https://stackoverflow.com/a/67091367/2943403) and [help with looping in php](https://stackoverflow.com/a/6494525/2943403) – mickmackusa Oct 20 '22 at 06:09