Views vs Materialize Vs Materialized View in Kusto

Question

Scenario: Data in kusto table is updated after 5 hours. Task: Call query from .net API In query , create a subquery and use that subquery to perform join on a bigger table

let table1=materialize(
Customer|where CustomerId=="cust-reg-aabb-cc232"|distinct CustomerId,City);
CustomerPurchase
|where CustomerId=="cust-reg-aabb-cc232"
//perform join with table1 and other things

or

let table1=view(){
Customer|where CustomerId=="cust-reg-aabb-cc232"|distinct CustomerId,City};
CustomerPurchase
|where CustomerId=="cust-reg-aabb-cc232"
//perform join with table1 and CustomerPurchase

CustomerPurchase and Customer data is being updated after every 5 hours(new rows being added). What is more optimized : create a view or use the materialize method. I went through the documentation but could not understand the different between both.

Also since I am implementing an API, is it possible to use materialized view instead of table1?

David דודו Markovitz · Accepted Answer · 2022-06-23T22:35:22.593

The documentation is quite clear:

materialize

Allows caching a subquery result during the time of query execution in a way that other subqueries can reference the partial result.

views

Views are virtual tables based on the result-set of a Kusto Query Language query. Just like a real table, a view contains rows and columns. Unlike a real table, a view doesn't hold its own data storage.

Views are defined through user-defined functions with the following requirements:

The result of the function must be tabular (for example, it cannot be a scalar value).

The function must take no arguments.

The view keyword

By default, operators that support a wildcard syntax to specify table names will not reference views, even if the view's name matches the wildcard. An example of this type of operator is the union operator. In this case, use the view keyword to have the view included as well.

Materialized views

Materialized views expose an aggregation query over a source table, or over another materialized view.

As the name suggests, the aggregation results are materialized, meaning - stored.
The results are being updated constantly, while the data keeps being ingested.

In your case, there seems to be no reason to use materialize(), nor the view keyword.

A materialized view such as the following (with filter on CustomerId) might improve performance when used instead of table1.

.create materialized-view Customer_mv on table Customer
{
    Customer
    | summarize by CustomerId, City
}

Update

Here are couple of examples that demonstrate the benefits of materialize()

1.

let t = print x = rand(1000);
union t, t, t

x
337
998
242

Fiddle

vs.

let t = materialize(print x = rand(1000));
union t, t, t

x
722
722
722

Fiddle

2.

let t1 = range i from 1 to 100000000 step 1 | summarize count() by i =  i%2;
let t2 = t1;
t1
| join kind=inner t2 on i

i	count_	i1	count_1
1	50000000	1	50000000
0	50000000	0	50000000

Execution Time: 4.4375515

Fiddle

vs.

let t1 = materialize(range i from 1 to 100000000 step 1 | summarize count() by i =  i%2);
let t2 = t1;
t1
| join kind=inner t2 on i

i	count_	i1	count_1
1	50000000	1	50000000
0	50000000	0	50000000

Execution Time: 2.5313002

Here is an example that demonstrates the benefits of a view

.create-or-alter function StormEvents_top_5_deaths_v ()
{
    cluster("help").database("Samples").StormEvents
    |   project  Duration       = EndTime - StartTime
                ,Distance       = round(geo_distance_2points(BeginLon, BeginLat, EndLon, EndLat))
                ,TotalDeath     = DeathsDirect + DeathsIndirect
                ,TotalInjuries  = InjuriesIndirect + InjuriesIndirect
                ,TotalDamage    = DamageCrops + DamageProperty            
    |   where    TotalDeath > 0
    |   top 5 by TotalDeath
}

StormEvents_top_5_deaths_v

Duration	Distance	TotalDeath	TotalDamage
20.07:00:00	0	14	0
00:18:00	20609	13	46000000
01:02:00	26080	11	250000000
2.20:00:00	0	10	0
18.09:00:00	0	10	0

Fiddle

I am not allowed to create new tables or something in the kusto cluster . Between views and materialize(), which would be a better choice? — jaime_, Jun 23 '22 at 03:00
Also in case of views also we must be storing them somewhere, while in materialize() we store it in cache. So I think materialize() would be faster? — jaime_, Jun 23 '22 at 03:05
"Between views and materialize()" - Neither. Your code does not seem to have anything to benefit from using any of them. — David דודו Markovitz, Jun 23 '22 at 22:38
"in case of views also we must be storing them somewhere" - No. We store the code, not the results. — David דודו Markovitz, Jun 23 '22 at 22:39
I still don't see the benefit of using a view. What is the benefit in the example you show? — amr ras, Oct 19 '22 at 04:06
@DavidדודוMarkovitz The last one is the one that shows benefits of using a view. Just trying to understand why use view over a subquery. Should I just post another question and let you answer that? — amr ras, Oct 19 '22 at 05:08
I think I get it. Views allow user-defined functions. Is that all? — amr ras, Oct 19 '22 at 05:14
KQL UDF without parameters, is the equivalent for RDBMS view. There is no KQL object of type view (there is materialize view though) — David דודו Markovitz, Oct 19 '22 at 06:34

Views vs Materialize Vs Materialized View in Kusto

1 Answers1

Update

1.

vs.

2.

vs.