Generate unique ids for all column values in a multi-dimensional array

Question

I have a file which looks like this:

Papiers peints > 3D et Perspective > 3D
Papiers peints > Carte du monde
Papiers peints > Fleurs
Papiers peints > Fleurs > Coquelicots
Tableaux > Cartes du monde
Tableaux > Fleurs
Tableaux > Fleurs > Coquelicots

and which I then transform into a properly formatted csv like this:

"Papiers peints","3D et Perspective","3D"
"Papiers peints","Carte du monde",
"Papiers peints","Fleurs",
"Papiers peints","Fleurs","Coquelicots"
"Tableaux","Cartes du monde",
"Tableaux","Fleurs",
"Tableaux","Fleurs","Coquelicots"

What I need is for each of these fields to have its own unique ID, which has to be an integer. This is how it should look like

"Papiers peints",101,"3D et Perspective",1001,"3D",10001
"Papiers peints",101,"Carte du monde",1002,,
"Papiers peints",101,"Fleurs",1003,,
"Papiers peints",101,"Fleurs",1003,"Coquelicots",10002
"Tableaux",102,"Cartes du monde",1004,,
"Tableaux",102,"Fleurs",1005,,
"Tableaux",102,"Fleurs",1005,"Coquelicots",10003

The names themselves don't matter at all and there will always be duplicates. I can solve this easily by uploading to a database. Then I do:

select distinct COL1, give them their respective IDs
select COL2, group by COL1, COL2, give them their respective IDs
select COL3, group by COL1, COL2,COL3, give them their respective IDs
repeat for as many times as needed, which can be quite a lot

How do I do this in PHP without having to use a database? A straight answer would be nice, but even a design idea would help a lot.

There could be up to 10 columns in my file, but here is a simplified input array to work with:

$new=[0=>['a0','a1','a2','a3'],1=>['b0','b1','b2','b3'],2=>['c0','c1','c2','c3'],3=>['d0','d1','d2','d3'],4=>['e0','e1','e2','e3']];

Expected Result:

[
    ['a0','101','a1','1001','a2','10001','a3','100001'],
    ['b0','102','b1','1002','b2','10002','b3','100002'],
    ['c0','103','c1','1003','c2','10003','c3','100003'],
    ['d0','104','d1','1004','d2','10004','d3','100004'],
    ['e0','105','e1','1005','e2','10005','e3','100005']
]

I find your question to be rather incomprehensible. Please show your best attempt to self solve. Please provide all necessary input to generate the desired output. Please edit your question so that it is not flagged as "unclear". — mickmackusa, May 10 '17 at 03:28
Best just delete and start again, probably. And sorry for being incomprehensible. It's a rather complicated problem. — flish, May 10 '17 at 05:15
do so if you wish, but there is no shame in editing your answer. Your question has no flags or downvotes. It would be better to keep it all here. I'll keep an eye out for an updated question. — mickmackusa, May 10 '17 at 05:16
and done. I updated the request and hopefully now it's clear. Although it looks like I didn't do any work on it and I usually try to avoid too broad questions. I still think arrays will solve the issue, I just don't know what I should do with them. — flish, May 12 '17 at 15:04
Your update is massively better/clearer now. I am certain that I can help you to find an efficient solution. May I just ask about your second code snippet (with the double quoting but no incremented ids)? Is this an array? or is it a string with linebreaks? What is the name of the variable that holds this data? These answers will help me to write a method for you that will be immediately useful in your project. (It is mother's day weekend, so I don't know how available I will be -- I have a family -- but I will provide an answer asap.) — mickmackusa, May 12 '17 at 21:17
The second code snippet is the actual csv - with fields enclosed in double quotes and with complete lines - so you could say it's an array. The original file might have 1 field on a row and 10 fields on the next. This is what I have for the moment: http://sandbox.onlinephpfunctions.com/code/84ff6b02218cea1626a099a8428a61789708e54c . I'll continue to work on it, but please don't feel like you should too. It's weekend and this is family time. I have 2 young children myself, so I won't make too much of an effort — flish, May 12 '17 at 22:31

mickmackusa · Accepted Answer · 2017-05-13T07:02:52.033

If this were my project, I'd probably look into preparing the necessary values while pulling the data from the file. I'll use the $new array as input (I've modified a few of the values while testing).

Code (Demo):

$new=[0=>['a0','a1','a2','a3'],1=>['a0','b1','b2','b3'],2=>['c0','b1','c2','c3'],3=>['d0','d1','c2','d3'],4=>['e0','e1','e2','f3']];

// OP says could be up to 10 columns...
for($x=0; $x<10; ++$x){
    $index_base=pow(10,2+$x)+1;         // or replace pow() with ** if php >=5.6
    // echo $x," : ",$index_base,"\n";  // uncomment if you want to see the values created
    $unique_col_val_keys[$x]=
        array_map(
            function(&$v)use($index_base){  // &$v means modify the input array's values
                return $v+$index_base;    // update the incremented ids (values) using $index_base
            },
            // the following 4 functions generate the input array to map
            array_flip(                   // swap keys and values within the array
                array_values(             // reset keys
                    array_unique(         // remove duplicate column values
                        array_column(
                            $new,$x       // get set of values from each column of multi-dim array
                        )
                    )
                )
            )
        );
}
//var_export($unique_col_val_keys);  // uncomment if you want to see the generated indexes

foreach($new as $row=>$a){
    if(!isset($result[$row])){$result[$row]=[];}
    foreach($a as $col=>$v){
        // echo "$row $col $v : {$unique_col_val_keys[$col][$v]}\n";
        array_push($result[$row],$v,"{$unique_col_val_keys[$col][$v]}");
    }
}
var_export($result);

Output:

array (
  0 => 
  array (
    0 => 'a0',
    1 => '101',
    2 => 'a1',
    3 => '1001',
    4 => 'a2',
    5 => '10001',
    6 => 'a3',
    7 => '100001',
  ),
  1 => 
  array (
    0 => 'a0',
    1 => '101',
    2 => 'b1',
    3 => '1002',
    4 => 'b2',
    5 => '10002',
    6 => 'b3',
    7 => '100002',
  ),
  2 => 
  array (
    0 => 'c0',
    1 => '102',
    2 => 'b1',
    3 => '1002',
    4 => 'c2',
    5 => '10003',
    6 => 'c3',
    7 => '100003',
  ),
  3 => 
  array (
    0 => 'd0',
    1 => '103',
    2 => 'd1',
    3 => '1003',
    4 => 'c2',
    5 => '10003',
    6 => 'd3',
    7 => '100004',
  ),
  4 => 
  array (
    0 => 'e0',
    1 => '104',
    2 => 'e1',
    3 => '1004',
    4 => 'e2',
    5 => '10004',
    6 => 'f3',
    7 => '100005',
  ),
)

@flish I've slapped together this working solution (I think it does what you need). If there are any unnecessary or missing components please explain what should be changed so that I can provide the most efficient method for you. For instance, if you don't need auto-incremented ids all the way to the 10th column, tell me the last column that requires automated id generation. ...happy to help -- this was a fun one. — mickmackusa, May 13 '17 at 03:38
Thank you very much. I'm glad it was fun for you :) it was a 'casse-tête' for me. It looks very well too, it's an elegant solution. I should work more with arrays rather than sql the life out of any file which comes my way - I know I don't do enough. — flish, May 13 '17 at 05:55

Generate unique ids for all column values in a multi-dimensional array

1 Answers1