Using the data provided by the AGE documentation as an example,
SELECT create_graph('graph_name');
SELECT * FROM cypher('graph_name', $$
CREATE (a:Person {name: 'A', age: 13}),
(b:Person {name: 'B', age: 33, eyes: "blue"}),
(c:Person {name: 'C', age: 44, eyes: "blue"}),
(d1:Person {name: 'D', eyes: "brown"}),
(d2:Person {name: 'D'}),
(a)-[:KNOWS]->(b),
(a)-[:KNOWS]->(c),
(a)-[:KNOWS]->(d1),
(b)-[:KNOWS]->(d2),
(c)-[:KNOWS]->(d2)
$$) as (a agtype);
percentileCont
Running the percentileCont()
function will produce an output:
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN percentileCont(n.age, 0.4)
$$) as (percentile_cont_age agtype);
percentile_cont_age
---------------------
29.0
(1 row)
Having taken a look at how the percentileCount()
is calculated from the 'agtype.c' file, linear interpolation is calculated where,
result = y1 + [(x - x1) * (y2 - y1)] / (x2 - x1)
x = percentile * (number_of_rows - 1) - x1
x1 = floor(percentile * (number_of_rows - 1))
x2 = ceil(percentile * (number_of_rows - 1))
y1 = value_of_x1
y2 = value_of_x2
In this example, as percentile = 0.4
and number_of_rows = 3
(with ages 13, 33, and 44), this would result to:
x = 0.4 * (3 - 1) - 0 = 0.8
x1 = floor(0.4 * (3 - 1)) = floor(0.8) = 0
x2 = ceil(0.4 * (3 - 1)) = ceil(0.8) = 1
y1 = value_of_x1 = 13
y2 = value_of_x2 = 33
result = 13 + [(0.8 - 0) * (33 - 13)] / (1 - 0) = 29
Which is exactly what we got when using the percentileCont()
function.
percentileDisc
Running the percentileDisc()
function will produce an output:
SELECT *
FROM cypher('graph_name', $$
MATCH (n:Person)
RETURN percentileDisc(n.age, 0.5)
$$) as (percentile_disc_age agtype);
percentile_disc_age
---------------------
33.0
(1 row)
This function uses a simpler method of calculation, using a rounding method and calculating the nearest value to the percentile.
result = round_to_nearest_val(percentile * (max_val - min_val) + min_val)
In this example, as percentile = 0.5
, max_val = 44
, and min_val = 13
(with ages 13, 33, and 44), this would result to:
result = round_to_nearest_val(0.5 * (44 - 13) + 13) = round_to_nearest_val(28.5) = 33
Which is exactly what we got when using the percentileDisc()
function.
Hope this helps!