0

Context:

  • Tableau and Power BI Report Server on prem against SQL RDBMS and SSAS Tabular (Sql Server 2016)
  • Denormalized table of data. Currently about 100m records.

Challenge:

Analyst does calculations which require MAX(datecol)

This doesn't seem possible in DAX in directquery mode, so I've pushed this back to the ETL and added a MaxDate column.

Now this is a big table and getting bigger by the day. DTA suggested a columnstore index, and that makes sense to me. But that index recommendation was based just on SELECT queries. I'll also have inserts and deletes (daily batch), and they will have to be followed by a:

UPDATE table SET MaxDate = MAX(Datecol)

My intuition is that this won't play nice with the columnstore index but I've not tested it. Perhaps depends on if is clustered or non-clustered?

Am I correct in saying that MaxDate has to be handled in ETL? Trying to do in DAX in DirectQuery mode results in errors.

Am I correct that care has to be taken with the columnstore index? Or perhaps it doesn't make sense at all because an index rebuild will be necessary?

I'm looking for a path forward that makes sense and avoids issues.

Chris Harrington
  • 1,238
  • 2
  • 15
  • 28

1 Answers1

0

Define your measure using the MAXA() function. I am able to work this using Tabular 2016 in DirectQuery mode.

Max Date:=MAXA(table[datecol])

The query issued to the SQL Server is like this:

SELECT MAX([t0].[DateCol]) AS [a0] FROM ((SELECT * FROM table)) AS [t0]

If you need the value to be in a column, then it is impossible to use MAXA() in direct query mode (at the moment) so I suggest you modify the table to use a query as source instead of a direct table reference.

SELECT table.*,
       max_date= (SELECT MAX(datecol) FROM dbo.table)
  FROM dbo.table

You can then create a calculated column, if you wish:

days_offset = DATEDIFF(table[datecol], table[max_date], DAY)

*Note that you cannot terminate the statement with a semicolon as it will be used in a subquery whenever the engine constructs queries to your database.

The subquery to get the MAX(datecol) should be very efficient if the table has a columnstore index, or if the column has a non-clustered, rowstore index. Far more efficient than updating every row in your table each day with a new maximum date.

mendosi
  • 2,001
  • 1
  • 12
  • 18
  • Interesting. But I'm still not sure now defining a measure helps me. MaxDate is first stage. Then I have DaysOffset (DateCol - MaxDate) and users are going to filter on DaysOffset in PowerBI. Are you saying that I can create a calculated column with MAXA()? – Chris Harrington Nov 10 '17 at 03:38