Follow-up Dax Drop Duplicates

Question

Similiar to a question asked here,

Given, this table, I want to only keep the records where the email appears first.

email	firstname	Lastname	Address	City	Zip
ABC@XYZ.com	Scott	Johnson	A	Z	1111
ABC@XYZ.com	Bill	Johnson	B	Y	2222
ABC@XYZ.com	Ted	Smith	C	X	3333
DEF@QRP.com	Steve	Williams	D	W	4444
XYZ@LMN.com	Sam	Samford	E	U	5555
XYZ@LMN.com	David	Beals	F	V	6666
DEF@QRP.com	Stephen	Jackson	G	T	7777
TUV@DEF.com	Seven	Alberts	H	S	8888

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>email</th>
      <th>firstname</th>
      <th>Lastname</th>
      <th>Address</th>
      <th>City</th>
      <th>Zip</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>ABC@XYZ.com</td>
      <td>Scott</td>
      <td>Johnson</td>
      <td>A</td>
      <td>Z</td>
      <td>1111</td>
    </tr>
    <tr>
      <td>ABC@XYZ.com</td>
      <td>Bill</td>
      <td>Johnson</td>
      <td>B</td>
      <td>Y</td>
      <td>2222</td>
    </tr>
    <tr>
      <td>ABC@XYZ.com</td>
      <td>Ted</td>
      <td>Smith</td>
      <td>C</td>
      <td>X</td>
      <td>3333</td>
    </tr>
    <tr>
      <td>DEF@QRP.com</td>
      <td>Steve</td>
      <td>Williams</td>
      <td>D</td>
      <td>W</td>
      <td>4444</td>
    </tr>
    <tr>
      <td>XYZ@LMN.com</td>
      <td>Sam</td>
      <td>Samford</td>
      <td>E</td>
      <td>U</td>
      <td>5555</td>
    </tr>
    <tr>
      <td>XYZ@LMN.com</td>
      <td>David</td>
      <td>Beals</td>
      <td>F</td>
      <td>V</td>
      <td>6666</td>
    </tr>
    <tr>
      <td>DEF@QRP.com</td>
      <td>Stephen</td>
      <td>Jackson</td>
      <td>G</td>
      <td>T</td>
      <td>7777</td>
    </tr>
    <tr>
      <td>TUV@DEF.com</td>
      <td>Seven</td>
      <td>Alberts</td>
      <td>H</td>
      <td>S</td>
      <td>8888</td>
    </tr>
  </tbody>
</table>

Expected output table:

email	firstname	Lastname	Address	City	Zip
ABC@XYZ.com	Scott	Johnson	A	Z	1111
DEF@QRP.com	Steve	Williams	D	W	4444
XYZ@LMN.com	Sam	Samford	E	U	5555
TUV@DEF.com	Seven	Alberts	H	S	8888

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>email</th>
      <th>firstname</th>
      <th>Lastname</th>
      <th>Address</th>
      <th>City</th>
      <th>Zip</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>ABC@XYZ.com</td>
      <td>Scott</td>
      <td>Johnson</td>
      <td>A</td>
      <td>Z</td>
      <td>1111</td>
    </tr>
    <tr>
      <td>DEF@QRP.com</td>
      <td>Steve</td>
      <td>Williams</td>
      <td>D</td>
      <td>W</td>
      <td>4444</td>
    </tr>
    <tr>
      <td>XYZ@LMN.com</td>
      <td>Sam</td>
      <td>Samford</td>
      <td>E</td>
      <td>U</td>
      <td>5555</td>
    </tr>
    <tr>
      <td>TUV@DEF.com</td>
      <td>Seven</td>
      <td>Alberts</td>
      <td>H</td>
      <td>S</td>
      <td>8888</td>
    </tr>
  </tbody>
</table>

score 2 · Accepted Answer · answered Dec 21 '20 at 17:58

There is no inherent ordering of a table in DAX, so in order to take the first row you need to add an index column or define an ordering on the table somehow.

For this answer, I'll assume that you've added an index column somehow (in the query editor or with a DAX calculated column).

You can create a filtered table as follows:

FilteredTable1 =
FILTER (
    Table1,
    Table1[Index]
        = CALCULATE ( MIN ( Table1[Index] ), ALLEXCEPT ( Table1, Table1[email] ) )
)

For each row in Table1, this checks if the index is minimal over all the rows with the same email.

score 1 · Answer 2 · answered Dec 22 '20 at 09:14

Assuming that we added an Index column with non duplicate values, it's possible to reduce the number of context transitions to only one per Email by preparing an Indexes table containing the indexes to be selected, and then apply this Indexes table as a filter using TREATAS.

T Index Unique = 
VAR Indexes =
    SELECTCOLUMNS(
        ALL( 'T Index'[Email] ),
        "MinIndex", CALCULATE( MIN( 'T Index'[Index] ) )
    )
RETURN
    CALCULATETABLE( 'T Index', TREATAS( Indexes, 'T Index'[Index] ) )

If instead we have non-unique column across the different Emails but unique per each email, like a timestamp, we can prepare a filter table containing the email and the timestamp

For instance with a T Date table like the following

The calculated table becomes

T Date Unique = 
VAR EmailDate =
    ADDCOLUMNS(
        ALL( 'T Date'[Email] ),
        "MinDate", CALCULATE( MIN( 'T Date'[Date] ) )
    )
RETURN
    CALCULATETABLE( 'T Date', TREATAS( EmailDate, 'T Date'[Email], 'T Date'[Date] ) )

Follow-up Dax Drop Duplicates

2 Answers2

Linked