1

I have 2 input Tables. Input Table1 is the source data and input Table 2 is a criteria table.

+--------------------------+----------+  +--------------------------+-------+
|       TABLE 1 (Source data)         |  |       TABLE 2 (Criterias)        |
+-------------------------------------+  +----------------------------------+

+-------------------------------------+  +----------------------------------+
| DESCRIPTION              | VALUE    |  | PREFIX                   | CODE  |
+-------------------------------------+  +----------------------------------+
| ID                       | 0        |  | 7235                     | ABX1  |
| NAME                     | JFMSC    |  | 3553                     | POWQ  |
| TYPE                     | UHELQ    |  | 7459                     | UWEER |
| DFRUL                    | F4       |  | 10012                    | ABX1  |
| ADDR                     | 10012002 |  | 430                      | ABX1  |
| RRUL                     | P1       |  +--------------------------+-------+ 
| ADDR                     | 723      |  
| RRUL                     | P1       |  
| ID                       | 2        |  
| NAME                     | PLLSJS   |  
| TYPE                     | UHELQ    |  
| DFRUL                    | P3       |  
| ID                       | 4        |  
| NAME                     | AAAARR   |  
| TYPE                     | UHELQ    |  
| DFRUL                    | T7       |  
| ADDR                     | 35531156 |  
| RRUL                     | P1       |  
| ADDR                     | 72358    |  
| RRUL                     | P1       |  
| ADDR                     | 86401    |  
| RRUL                     | K9       |  
| ID                       | 0        |  
| NAME                     | PPROOA   |  
| TYPE                     | RRHN     |  
| DFRUL                    | P1       |  
| ADDR                     | 43001    |  
| RRUL                     | T8       |  
| ADDR                     | 7459001  |  
| RRUL                     | D4       |  
| ADDR                     | 430457   |  
| RRUL                     | W2       |  
| ADDR                     | 745913   |  
| RRUL                     | P1       |  
| ADDR                     | 74598001 |  
| RRUL                     | Y5       |  
+--------------------------+----------+

My goal is to get the an output table like below (Would be the Table #4), that shows the CODE that is THE MOST similar compared with each number of field "ADDR" based on criterias of "TABLE 2". If there are repeated CODEs for each ID, I only want to show one (unique codes list).

I explain in more detail in Sample file attached here SampleV1.xlsx.

I want to Transform the data based in Input Table 1 and 2 to get an output table like this (Desired OUTPUT TABLE #2 in file attached):

+----+--------+-------+-------+-------+------+
| ID | NAME   | TYPE  | DFRUL | CODE  | RRUL |
+----+--------+-------+-------+-------+------+  
| 0  | JFMSC  | UHELQ | P1    | ABX1  | P1   |
| 2  | PLLSJS | UHELQ | P3    |       |      |
| 4  | AAAARR | UHELQ | T7    | POWQ  | P1   |
|    |        |       |       | ABX1  | P1   |
|    |        |       |       | 86401 | K9   |
| 0  | PPROOA | RRHN  | P1    | ABX1  | P1   |
|    |        |       |       | UWEER | P1   |
+----+--------+-------+-------+-------+------+      

I hope someone could help me with this. Thanks in advance.

Ger Cas
  • 2,188
  • 2
  • 18
  • 45
  • There's too much going on in this question. It seems to me your main issue is attempting to match a full ADDR to a prefix, so I'd recommend posting a more specific question targeted at just that. An [mcve](https://stackoverflow.com/help/mcve) is ideal. You've got plenty of detail, but it's more likely to attract a solution and be useful for future readers if it's a bit more focused and minimal. – Alexis Olson Mar 16 '19 at 15:30
  • I understand, but if I don't put anything else than input and output desired normally they ask, what have you done? Hehe – Ger Cas Mar 16 '19 at 17:18
  • @Alexis Olson I've edited the question, I hope be better in this way. Thanks – Ger Cas Mar 16 '19 at 17:27

1 Answers1

1

Below is the UPDATED solution.

In general, I compiled the solution in order to be as less vulnerable to problems with data, as possible.

The only constrains to data are:

  1. Field sets must have ID field, which must be the first field of set.

  2. all the RRUL and ADDR have to be in pairs,

  3. Duplicates of RRUL/ADDR pairs inside one ID are acceptable or absent.

I also compiled the solution in a way to correctly find the closest value in all possible variants of ADDR and PREFIX. By the way - there is one case, not covered in your bigsample - when PREFIX is shorter then ADDR but not equal to it. If there are such cases - my solution handles them correctly but demands some performance overhead for this particular situation.

let
        Source = #"Source data",
    #"Added Index1" = Table.AddIndexColumn(Source, "Index", 0, 1),

    #"Added Custom" = Table.AddColumn(#"Added Index1", "Main Key", each if [DESCRIPTION] = "ID" then [Index] else null, type number),

    #"Added Custom10" = Table.AddColumn(#"Added Custom", "Last notADDR", each 
        if [DESCRIPTION] <> "ADDR" and [DESCRIPTION] <> "RRUL" then [Index] else null),

    #"Filled Down" = Table.FillDown(#"Added Custom10",{"Main Key", "Last notADDR"}),

    #"Added Custom2" = Table.AddColumn(#"Filled Down", "Key", each [Main Key] + (
        if [DESCRIPTION] = "RRUL" then [Index] - [Last notADDR] - 2 
            else if [DESCRIPTION] = "ADDR" then [Index] - [Last notADDR] - 1 else 0)),

    #"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"Index", "Main Key", "Last notADDR"}),

    #"Pivoted Column1" = Table.Pivot(#"Removed Columns", 
        List.Distinct(#"Removed Columns"[DESCRIPTION]), "DESCRIPTION", "VALUE"),

    #"Added Custom3" = Table.AddColumn(#"Pivoted Column1", "CODE", each if [ADDR] = null then null else let t = Table.AddIndexColumn(Table.SelectRows(Criterias, (x)=> 
        let s=List.Sort({x[PREFIX], [ADDR]}, each Text.Length(_)) in Text.StartsWith(s{1}, s{0})), "Index")
            in if Table.RowCount(t) > 0 then Table.First(Table.Sort(t, (y)=> Number.BitwiseShiftLeft(Number.Abs(Text.Length([ADDR]) - Text.Length(y[PREFIX])), 16) + y[Index]))[CODE] 
            else "Not Found"),
    #"Removed Columns1" = Table.RemoveColumns(#"Added Custom3",{"Key", "ADDR"}),
    #"Filled Down1" = Table.FillDown(#"Removed Columns1",{"ID", "NAME", "TYPE", "DFRUL"})
in
    #"Filled Down1"
Andrey Minakov
  • 545
  • 2
  • 5
  • 19
  • Hi Andrey, thanks so much for answer and take time and detailed and clear code that can be understandable better. Your solution produces a closer output. I'll try to show you the differences asap. – Ger Cas Mar 22 '19 at 00:10
  • Hi Andrey, your output shows similar vs my desired output, but in my output the CODEs are shown only once. In your solution the CODEs appeared the same number of times that an ADDR value related with some CODE. Thanks for the help. Regards – Ger Cas Mar 22 '19 at 07:29