Please help compare the performance between these two methods. How many times will the difference be and what are the causes?
Asked
Active
Viewed 18 times
0
-
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Dec 02 '21 at 15:09
1 Answers
0
Suppose we use the following data to perform the calculations:
login(`admin, `123456)
pnodeRun(clearAllCache)
undef all
syms = format(1..3000, "SH000000")
N = 10000
t = cj(table(syms as symbol), table(rand(100.0, N) as price, rand(10000, N) as volume))
Method 1: calculating by using context by takes about 3.3 seconds.
timer result1 = select mwavg(price, volume, 4) from t context by symbol
Method 2: calculating by using for loop takes about 25 minutes.
arr = array(ANY, syms.size())
timer {
for(i in 0 : syms.size()) {
price_vec = exec price from t where symbol = syms[i]
volume_vec = exec volume from t where symbol = syms[i]
arr[i] = mwavg(price_vec, volume_vec, 4)
}
res = reduce(join, arr)
}
The performance difference between these two methods is about 400 times. The function context by groups all stocks at once, and then calculates each group separately. When using for loop, the entire table will be scanned to retrieve the corresponding 10000 records of one certain stock for each loop, which takes a longer time.

Polly
- 603
- 3
- 13