1

I have a Power BI report pulling from SQL Server that needs to be set up for incremental refresh due to the large data pull. As the load is fairly complex (and PQuery editor is tedious and often breaks folding), I need to use a SQL query (aka a "native query" in PBI speak) while retaining query folding (so that incremental refresh works).

I've been using the nice...

Value.NativeQuery( Source, query, null, [EnableFolding = true])

... trick found here to get that working.

BUT it only seems to work if the native query finishes fairly quickly. When my WHERE clause only pulls data for this year, it's no problem. When I remove the date filter in the WHERE clause (so as to not conflict with the incremental refresh filter), or simply push the year farther back, it takes longer seemingly causing PBI to determine that:

"We cannot fold on top of this native query. Please modify the native query or remove the 'EnableFolding' option."

The error above comes up after a few minutes both in the PQuery editor or if I try to "bypass" it by quickly quickly clicking Close & Apply. And unfortunately, the underlying SQL is probably about as good as it gets due to our not-so-great data structures. I've tried tricking PBI's seeming time-out via an OPTION (FAST 1) in the script, but it just can't pull anything quick enough.

I'm now stuck. This seems like a silly barrier as all I need to do is get that first import to complete as obviously it can query fold for the shorter loads. How do I work past this?

Josh
  • 167
  • 1
  • 13
  • You can put your native sql query directly in Sql.Databases and don't have to wait till Value.NativeQuery. You can reduce one query step and processing with that. The most important part is to optimize Sql Query by creating index on predicate columns. Stack has tons of support for Sql query optimization. – smpa01 Sep 25 '21 at 04:18
  • @smpa01 What do you mean put it directly in Sql.Databases? Do you mean as a view? Either way the view is a good thought. Index-wise, we're not likely to get much farther. Our Engineering team ultimately decided that a clustered columnstore was probably the best bang for our buck here (huge tables, lots of predicates). – Josh Sep 25 '21 at 13:00
  • If you look at Chris's article, you can see the first step is Sql.Databases and the 2nd one is Value.NativeQuery. You are putting your sql query in the 2nd step and letting the first step process worhout any query when you can simply utilize your query in the 1st step. Also, given your scenario, you can create table variable (in TSQL) and create index in the table variable, have the initial query result inserted there and keep on utilizing the table from that table variable which had index. I experimented with tjis technique for a slow DF situation, drastically improved my query execution. – smpa01 Sep 25 '21 at 13:06
  • @smpa01 I'm not sure I follow. The whole point of using the the NativeQuery() step (which is a separate step in Chris's blog) is to be able to use the [EnableFolding=true] option. That isn't allowable in Sql.Database() even though the [Query=...] is. Simply running a native query as a source breaks query folding (and thus incremental refresh). I don't see where a table variable helps either. Table variables are local, so I'd have to first insert into it in my query, so the initial pull from the source wouldn't be faster. Now storing my query as a view on the server might work though I think. – Josh Sep 25 '21 at 15:15
  • If you look at the Chris's query again he is doing the filtering (Day=Friday) in M and since the query folding is enabled in the step before, that filtering is getting folded to source. To my knowledge, if you are not writing any server-side query for the transformation and you want all the transformation to happen through M, enabling query folding helps in succeeding steps, as equivalent syntaxes get translated to foldable source. But in your case, you are doing the whole transformation server side. So I don't know if there is any value for you to wait till the step#2 to enable query. – smpa01 Sep 25 '21 at 16:06
  • Cause you are paying for step#1 anyway, but that step is getting wasted. Also, there is an article somwhere in Chris's blog about Table.View which says that when SQL query executes in PQ, it evaluates twice. PQ first runs it to get the column schema and secondly to return execution result. Which is why the execution time for any server-side query takes longer in PQ compared to directly on the source (SQL query in PQ vs SSMS).If you use a table variable there you specify the column schema by default. I have experimented with this and PQ execution time improved due to the usage of table – smpa01 Sep 25 '21 at 16:12
  • Variable and I like to think it is due to usage of default schema in the first line of query which PQ can run quickly determine the column schema rather than to execute the whole query internally to know the schema. It might not help your cause,but it is very much worth giving this a spin rather than discussing it on the theoratical level. – smpa01 Sep 25 '21 at 16:16
  • Asking this out of curiosity, what is SQL execution time in SSMS and what are the size (rowcount) of your SQL tables ? – smpa01 Sep 25 '21 at 16:20
  • @smpa01 Execution time in SSMS is ~30min for ~32mill rows. I *think* I get what you mean with the table var now as I do notice how Pquery tends to run things twice - would need to play with it. I don't get the paying for the first step though as in Chris's blog, the first step is just defining the source. I ultimately figured out a solution though which I'll post below - in retrospect, it's silly I didn't think to do it initially. :) – Josh Sep 25 '21 at 16:46

1 Answers1

0

In retrospect, it's silly that I didn't try this initially, but even though the Value.NativeQuery() M step doesn't allow a command time-out option, you can still put it in a preceding Sql.Database() step manually and it carries forward. I also removed some common table expressions from my query which also were breaking query folding (not sure why, but easy fix by saving my complex logic as a view in sql server itself and just joining to that). Takes a while to run now, but doesn't time-out!

Josh
  • 167
  • 1
  • 13