5

Scenario: Data in kusto table is updated after 5 hours. Task: Call query from .net API In query , create a subquery and use that subquery to perform join on a bigger table

let table1=materialize(
Customer|where CustomerId=="cust-reg-aabb-cc232"|distinct CustomerId,City);
CustomerPurchase
|where CustomerId=="cust-reg-aabb-cc232"
//perform join with table1 and other things

or

let table1=view(){
Customer|where CustomerId=="cust-reg-aabb-cc232"|distinct CustomerId,City};
CustomerPurchase
|where CustomerId=="cust-reg-aabb-cc232"
//perform join with table1 and CustomerPurchase

CustomerPurchase and Customer data is being updated after every 5 hours(new rows being added). What is more optimized : create a view or use the materialize method. I went through the documentation but could not understand the different between both.

Also since I am implementing an API, is it possible to use materialized view instead of table1?

jaime_
  • 71
  • 1
  • 7

1 Answers1

4

The documentation is quite clear:

materialize

Allows caching a subquery result during the time of query execution in a way that other subqueries can reference the partial result.

views

Views are virtual tables based on the result-set of a Kusto Query Language query. Just like a real table, a view contains rows and columns. Unlike a real table, a view doesn't hold its own data storage.

Views are defined through user-defined functions with the following requirements:

  • The result of the function must be tabular (for example, it cannot be a scalar value).
  • The function must take no arguments.

The view keyword

By default, operators that support a wildcard syntax to specify table names will not reference views, even if the view's name matches the wildcard. An example of this type of operator is the union operator. In this case, use the view keyword to have the view included as well.

Materialized views

Materialized views expose an aggregation query over a source table, or over another materialized view.

As the name suggests, the aggregation results are materialized, meaning - stored.
The results are being updated constantly, while the data keeps being ingested.


In your case, there seems to be no reason to use materialize(), nor the view keyword.

A materialized view such as the following (with filter on CustomerId) might improve performance when used instead of table1.

.create materialized-view Customer_mv on table Customer
{
    Customer
    | summarize by CustomerId, City
}

Update

Here are couple of examples that demonstrate the benefits of materialize()

1.

let t = print x = rand(1000);
union t, t, t
x
337
998
242

Fiddle

vs.

let t = materialize(print x = rand(1000));
union t, t, t
x
722
722
722

Fiddle

2.

let t1 = range i from 1 to 100000000 step 1 | summarize count() by i =  i%2;
let t2 = t1;
t1
| join kind=inner t2 on i
i count_ i1 count_1
1 50000000 1 50000000
0 50000000 0 50000000

Execution Time: 4.4375515

Fiddle

vs.

let t1 = materialize(range i from 1 to 100000000 step 1 | summarize count() by i =  i%2);
let t2 = t1;
t1
| join kind=inner t2 on i
i count_ i1 count_1
1 50000000 1 50000000
0 50000000 0 50000000

Execution Time: 2.5313002


Here is an example that demonstrates the benefits of a view

.create-or-alter function StormEvents_top_5_deaths_v ()
{
    cluster("help").database("Samples").StormEvents
    |   project  Duration       = EndTime - StartTime
                ,Distance       = round(geo_distance_2points(BeginLon, BeginLat, EndLon, EndLat))
                ,TotalDeath     = DeathsDirect + DeathsIndirect
                ,TotalInjuries  = InjuriesIndirect + InjuriesIndirect
                ,TotalDamage    = DamageCrops + DamageProperty            
    |   where    TotalDeath > 0
    |   top 5 by TotalDeath
}

StormEvents_top_5_deaths_v
Duration Distance TotalDeath TotalInjuries TotalDamage
20.07:00:00 0 14 0 0
00:18:00 20609 13 0 46000000
01:02:00 26080 11 0 250000000
2.20:00:00 0 10 0 0
18.09:00:00 0 10 0 0

Fiddle

David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88