1

Similiar to a question asked here,

Given, this table, I want to only keep the records where the email appears first.

email firstname Lastname Address City Zip
ABC@XYZ.com Scott Johnson A Z 1111
ABC@XYZ.com Bill Johnson B Y 2222
ABC@XYZ.com Ted Smith C X 3333
DEF@QRP.com Steve Williams D W 4444
XYZ@LMN.com Sam Samford E U 5555
XYZ@LMN.com David Beals F V 6666
DEF@QRP.com Stephen Jackson G T 7777
TUV@DEF.com Seven Alberts H S 8888

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>email</th>
      <th>firstname</th>
      <th>Lastname</th>
      <th>Address</th>
      <th>City</th>
      <th>Zip</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>ABC@XYZ.com</td>
      <td>Scott</td>
      <td>Johnson</td>
      <td>A</td>
      <td>Z</td>
      <td>1111</td>
    </tr>
    <tr>
      <td>ABC@XYZ.com</td>
      <td>Bill</td>
      <td>Johnson</td>
      <td>B</td>
      <td>Y</td>
      <td>2222</td>
    </tr>
    <tr>
      <td>ABC@XYZ.com</td>
      <td>Ted</td>
      <td>Smith</td>
      <td>C</td>
      <td>X</td>
      <td>3333</td>
    </tr>
    <tr>
      <td>DEF@QRP.com</td>
      <td>Steve</td>
      <td>Williams</td>
      <td>D</td>
      <td>W</td>
      <td>4444</td>
    </tr>
    <tr>
      <td>XYZ@LMN.com</td>
      <td>Sam</td>
      <td>Samford</td>
      <td>E</td>
      <td>U</td>
      <td>5555</td>
    </tr>
    <tr>
      <td>XYZ@LMN.com</td>
      <td>David</td>
      <td>Beals</td>
      <td>F</td>
      <td>V</td>
      <td>6666</td>
    </tr>
    <tr>
      <td>DEF@QRP.com</td>
      <td>Stephen</td>
      <td>Jackson</td>
      <td>G</td>
      <td>T</td>
      <td>7777</td>
    </tr>
    <tr>
      <td>TUV@DEF.com</td>
      <td>Seven</td>
      <td>Alberts</td>
      <td>H</td>
      <td>S</td>
      <td>8888</td>
    </tr>
  </tbody>
</table>

Expected output table:

email firstname Lastname Address City Zip
ABC@XYZ.com Scott Johnson A Z 1111
DEF@QRP.com Steve Williams D W 4444
XYZ@LMN.com Sam Samford E U 5555
TUV@DEF.com Seven Alberts H S 8888

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>email</th>
      <th>firstname</th>
      <th>Lastname</th>
      <th>Address</th>
      <th>City</th>
      <th>Zip</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>ABC@XYZ.com</td>
      <td>Scott</td>
      <td>Johnson</td>
      <td>A</td>
      <td>Z</td>
      <td>1111</td>
    </tr>
    <tr>
      <td>DEF@QRP.com</td>
      <td>Steve</td>
      <td>Williams</td>
      <td>D</td>
      <td>W</td>
      <td>4444</td>
    </tr>
    <tr>
      <td>XYZ@LMN.com</td>
      <td>Sam</td>
      <td>Samford</td>
      <td>E</td>
      <td>U</td>
      <td>5555</td>
    </tr>
    <tr>
      <td>TUV@DEF.com</td>
      <td>Seven</td>
      <td>Alberts</td>
      <td>H</td>
      <td>S</td>
      <td>8888</td>
    </tr>
  </tbody>
</table>
Alexis Olson
  • 38,724
  • 7
  • 42
  • 64
Scott Boston
  • 147,308
  • 15
  • 139
  • 187

2 Answers2

2

There is no inherent ordering of a table in DAX, so in order to take the first row you need to add an index column or define an ordering on the table somehow.

For this answer, I'll assume that you've added an index column somehow (in the query editor or with a DAX calculated column).

You can create a filtered table as follows:

FilteredTable1 =
FILTER (
    Table1,
    Table1[Index]
        = CALCULATE ( MIN ( Table1[Index] ), ALLEXCEPT ( Table1, Table1[email] ) )
)

For each row in Table1, this checks if the index is minimal over all the rows with the same email.

Result

Alexis Olson
  • 38,724
  • 7
  • 42
  • 64
1

Assuming that we added an Index column with non duplicate values, it's possible to reduce the number of context transitions to only one per Email by preparing an Indexes table containing the indexes to be selected, and then apply this Indexes table as a filter using TREATAS.

T Index Unique = 
VAR Indexes =
    SELECTCOLUMNS(
        ALL( 'T Index'[Email] ),
        "MinIndex", CALCULATE( MIN( 'T Index'[Index] ) )
    )
RETURN
    CALCULATETABLE( 'T Index', TREATAS( Indexes, 'T Index'[Index] ) )

Result

If instead we have non-unique column across the different Emails but unique per each email, like a timestamp, we can prepare a filter table containing the email and the timestamp

For instance with a T Date table like the following

T Date table

The calculated table becomes

T Date Unique = 
VAR EmailDate =
    ADDCOLUMNS(
        ALL( 'T Date'[Email] ),
        "MinDate", CALCULATE( MIN( 'T Date'[Date] ) )
    )
RETURN
    CALCULATETABLE( 'T Date', TREATAS( EmailDate, 'T Date'[Email], 'T Date'[Date] ) )

T Date Unique

sergiom
  • 4,791
  • 3
  • 24
  • 32