we are facing slow performance with power bi DIRECT QUERY reading the data from delta lake in Azure Databricks
We are trying to refresh some PBI dashboards which are reading data from Delta tables. Models are created in power BI Desktop . DirectQuery is used on PBI side to automatically update the desired partitions Underlying delta tables are partitioned on date keys Plan generated shows Broadcast hash joins and use of partitions properly.
The Spark UI -JDBC/ODBC tab shows sessions fired from PBI when a visual is refreshed and after 2 or 3 constant refreshes - performance improves
but once there is a gap of 5+ minutes - the performance degrades and it shows something like "Listing Corss References" etc in session executions .
seems some sort of plan is purged from memory which makes it slower.
Can someone provide insights into what is happening here
EDIT 1: Added a screenshot from the Spark UI (jdbc/odbc)
EDIT 2: Added Query plan
= Parsed Logical Plan == Project [col_0#407547 AS User ID#407618, tohivestring(col_1)#407567 AS Measured Time Timestamp#407619, col_2#407549L AS Measured Date Key#407620L, col_3#407550L AS Measured Time Key#407621L, col_4#407551L AS Location Key#407622L, col_5#407552L AS Measured Time Banding Key#407623L, col_6#407565 AS Measured Time in Secs#407624, col_7#407566 AS Measured Time Duration In Secs#407625, col_8#407555 AS Person Count#407626] +- Project [col_0#407547, tohivestring(col_1#407548, Some(Etc/UTC)) AS tohivestring(col_1)#407567, col_2#407549L, col_3#407550L, col_4#407551L, col_5#407552L, cast(col_6#407553 as string) AS col_6#407565, cast(col_7#407554 as string) AS col_7#407566, col_8#407555] +- Project [User ID#407422 AS col_0#407547, Measured Time Timestamp#407423 AS col_1#407548, Measured Date Key#407424L AS col_2#407549L, Measured Time Key#407425L AS col_3#407550L, Location Key#407426L AS col_4#407551L, Measured Time Banding Key#407427L AS col_5#407552L, Measured Time in Secs#407428 AS col_6#407553, Measured Time Duration In Secs#407429 AS col_7#407554, Person Count#407430 AS col_8#407555] +- Project [User ID#407422, Measured Time Timestamp#407423, Measured Date Key#407424L, Measured Time Key#407425L, Location Key#407426L, Measured Time Banding Key#407427L, Measured Time in Secs#407428, Measured Time Duration In Secs#407429, Person Count#407430] +- Filter ((cast(Measured Date Key#407424L as double) >= 2.0230822E7) AND (cast(Measured Date Key#407424L as double) < 2.0230823E7)) +- SubqueryAlias spark_catalog.silver.viewqueuemanagementperuserarriving +- View (hive_metastore.silver.viewqueuemanagementperuserarriving, [User ID#407422,Measured Time Timestamp#407423,Measured Date Key#407424L,Measured Time Key#407425L,Location Key#407426L,Measured Time Banding Key#407427L,Measured Time in Secs#407428,Measured Time Duration In Secs#407429,Person Count#407430]) +- Project [cast(User ID#407413 as string) AS User ID#407422, cast(Measured Time Timestamp#407414 as timestamp) AS Measured Time Timestamp#407423, cast(Measured Date Key#407415L as bigint) AS Measured Date Key#407424L, cast(Measured Time Key#407416L as bigint) AS Measured Time Key#407425L, cast(Location Key#407417L as bigint) AS Location Key#407426L, cast(Measured Time Banding Key#407418L as bigint) AS Measured Time Banding Key#407427L, cast(Measured Time in Secs#407419 as decimal(10,0)) AS Measured Time in Secs#407428, cast(Measured Time Duration In Secs#407420 as decimal(10,0)) AS Measured Time Duration In Secs#407429, cast(Person Count#407421 as int) AS Person Count#407430] +- Project [UserId#407488 AS User ID#407413, cast(from_unixtime(unix_timestamp(MeasuredTimeDateTime#407489, yyyy-MM-dd HH:mm:ss.SSS, Some(Etc/UTC), false), yyyy-MM-dd HH:mm, Some(Etc/UTC)) as timestamp) AS Measured Time Timestamp#407414, dimMeasureDatekey#407490L AS Measured Date Key#407415L, dimMeasureTimekey#407491L AS Measured Time Key#407416L, dimLocationKey#407493L AS Location Key#407417L, dimMeasureTimeBandingKey#407494L AS Measured Time Banding Key#407418L, MeasureTimeInSecs#407495 AS Measured Time in Secs#407419, MeasureTimeDurationInSecs#407497 AS Measured Time Duration In Secs#407420, PersonCount#407500 AS Person Count#407421] +- Filter (Function#407513 = Immigration) +- Join Inner, (dimLocationKey#407493L = dimLocationKey#407508L) :- SubqueryAlias Q : +- SubqueryAlias spark_catalog.silver.factqueuemanagementperuser : +- Relation spark_catalog.silver.factqueuemanagementperuser[DateTimeCreated#407480,CreatedByUser#407481,DateTimeLastModified#407482,ModifiedByUser#407483,dimAirportKey#407484L,factQPerUserKey#407485L,QAreaId#407486L,QAreaName#407487,UserId#407488,MeasuredTimeDateTime#407489,dimMeasureDatekey#407490L,dimMeasureTimekey#407491L,RouteStartDateTime#407492,dimLocationKey#407493L,dimMeasureTimeBandingKey#407494L,MeasureTimeInSecs#407495,MeasureTimeInMins#407496,MeasureTimeDurationInSecs#407497,MeasureTimeNoFilterInSecs#407498,MeasureTimeNoFilterInMins#407499,PersonCount#407500,Outcome#407501,dimTerminalKey#407502L,BatchId#407503L] parquet +- SubqueryAlias L +- SubqueryAlias spark_catalog.silver.dimlocation +- Relation spark_catalog.silver.dimlocation[DateTimeCreated#407504,CreatedByUser#407505,DateTimeLastModified#407506,ModifiedByUser#407507,dimLocationKey#407508L,Airport#407509,Terminal#407510,Side#407511,A/D#407512,Function#407513,SubArea#407514,LocationName#407515,ProcessName#407516,SystemName#407517,SystemDeviceId#407518,BatchId#407519L] parquet
== Analyzed Logical Plan == User ID: string, Measured Time Timestamp: string, Measured Date Key: bigint, Measured Time Key: bigint, Location Key: bigint, Measured Time Banding Key: bigint, Measured Time in Secs: string, Measured Time Duration In Secs: string, Person Count: int Project [col_0#407547 AS User ID#407618, tohivestring(col_1)#407567 AS Measured Time Timestamp#407619, col_2#407549L AS Measured Date Key#407620L, col_3#407550L AS Measured Time Key#407621L, col_4#407551L AS Location Key#407622L, col_5#407552L AS Measured Time Banding Key#407623L, col_6#407565 AS Measured Time in Secs#407624, col_7#407566 AS Measured Time Duration In Secs#407625, col_8#407555 AS Person Count#407626] +- Project [col_0#407547, tohivestring(col_1#407548, Some(Etc/UTC)) AS tohivestring(col_1)#407567, col_2#407549L, col_3#407550L, col_4#407551L, col_5#407552L, cast(col_6#407553 as string) AS col_6#407565, cast(col_7#407554 as string) AS col_7#407566, col_8#407555] +- Project [User ID#407422 AS col_0#407547, Measured Time Timestamp#407423 AS col_1#407548, Measured Date Key#407424L AS col_2#407549L, Measured Time Key#407425L AS col_3#407550L, Location Key#407426L AS col_4#407551L, Measured Time Banding Key#407427L AS col_5#407552L, Measured Time in Secs#407428 AS col_6#407553, Measured Time Duration In Secs#407429 AS col_7#407554, Person Count#407430 AS col_8#407555] +- Project [User ID#407422, Measured Time Timestamp#407423, Measured Date Key#407424L, Measured Time Key#407425L, Location Key#407426L, Measured Time Banding Key#407427L, Measured Time in Secs#407428, Measured Time Duration In Secs#407429, Person Count#407430] +- Filter ((cast(Measured Date Key#407424L as double) >= 2.0230822E7) AND (cast(Measured Date Key#407424L as double) < 2.0230823E7)) +- SubqueryAlias spark_catalog.silver.viewqueuemanagementperuserarriving +- View (hive_metastore.silver.viewqueuemanagementperuserarriving, [User ID#407422,Measured Time Timestamp#407423,Measured Date Key#407424L,Measured Time Key#407425L,Location Key#407426L,Measured Time Banding Key#407427L,Measured Time in Secs#407428,Measured Time Duration In Secs#407429,Person Count#407430]) +- Project [cast(User ID#407413 as string) AS User ID#407422, cast(Measured Time Timestamp#407414 as timestamp) AS Measured Time Timestamp#407423, cast(Measured Date Key#407415L as bigint) AS Measured Date Key#407424L, cast(Measured Time Key#407416L as bigint) AS Measured Time Key#407425L, cast(Location Key#407417L as bigint) AS Location Key#407426L, cast(Measured Time Banding Key#407418L as bigint) AS Measured Time Banding Key#407427L, cast(Measured Time in Secs#407419 as decimal(10,0)) AS Measured Time in Secs#407428, cast(Measured Time Duration In Secs#407420 as decimal(10,0)) AS Measured Time Duration In Secs#407429, cast(Person Count#407421 as int) AS Person Count#407430] +- Project [UserId#407488 AS User ID#407413, cast(from_unixtime(unix_timestamp(MeasuredTimeDateTime#407489, yyyy-MM-dd HH:mm:ss.SSS, Some(Etc/UTC), false), yyyy-MM-dd HH:mm, Some(Etc/UTC)) as timestamp) AS Measured Time Timestamp#407414, dimMeasureDatekey#407490L AS Measured Date Key#407415L, dimMeasureTimekey#407491L AS Measured Time Key#407416L, dimLocationKey#407493L AS Location Key#407417L, dimMeasureTimeBandingKey#407494L AS Measured Time Banding Key#407418L, MeasureTimeInSecs#407495 AS Measured Time in Secs#407419, MeasureTimeDurationInSecs#407497 AS Measured Time Duration In Secs#407420, PersonCount#407500 AS Person Count#407421] +- Filter (Function#407513 = Immigration) +- Join Inner, (dimLocationKey#407493L = dimLocationKey#407508L) :- SubqueryAlias Q : +- SubqueryAlias spark_catalog.silver.factqueuemanagementperuser : +- Relation spark_catalog.silver.factqueuemanagementperuser[DateTimeCreated#407480,CreatedByUser#407481,DateTimeLastModified#407482,ModifiedByUser#407483,dimAirportKey#407484L,factQPerUserKey#407485L,QAreaId#407486L,QAreaName#407487,UserId#407488,MeasuredTimeDateTime#407489,dimMeasureDatekey#407490L,dimMeasureTimekey#407491L,RouteStartDateTime#407492,dimLocationKey#407493L,dimMeasureTimeBandingKey#407494L,MeasureTimeInSecs#407495,MeasureTimeInMins#407496,MeasureTimeDurationInSecs#407497,MeasureTimeNoFilterInSecs#407498,MeasureTimeNoFilterInMins#407499,PersonCount#407500,Outcome#407501,dimTerminalKey#407502L,BatchId#407503L] parquet +- SubqueryAlias L +- SubqueryAlias spark_catalog.silver.dimlocation +- Relation spark_catalog.silver.dimlocation[DateTimeCreated#407504,CreatedByUser#407505,DateTimeLastModified#407506,ModifiedByUser#407507,dimLocationKey#407508L,Airport#407509,Terminal#407510,Side#407511,A/D#407512,Function#407513,SubArea#407514,LocationName#407515,ProcessName#407516,SystemName#407517,SystemDeviceId#407518,BatchId#407519L] parquet