Update 9/11/2019
Note updated version
WITH
rand()/4294967295*100 AS s,
60 AS t1,
50 AS t10,
30 AS t20,
20 AS t2,
if(s < t10, 0, if(s > t1, 1, NULL)) as signal1,
if(s > t20, 0, if(s < t2, 1, NULL)) as signal2
SELECT
arrayFill(x -> (x != -2), groupArray(toInt8(ifNull(signal1 - signal2, -2)))) as k
FROM numbers_mt(10000000);
Updated based on feedback from Summer.H. Following timings best of 5 runs on each on my system (Core i7-7820X). Very little material difference in the timings between each.
10 million signals
Random generation + computation
- 1 thread - DolphinDB 337ms, ClickHouse 291ms
- 2 threads (1 core) - DolphinDB 226ms, ClickHouse 189ms
Computation only
- 1 thread - DolphinDB 302ms, ClickHouse 233ms
- 2 threads (1 core) - DolphinDB 179ms, ClickHouse 165ms
250 million signals
Random generation + computation
- 1 thread - DolphinDB 7.901s, ClickHouse 7.103s
- 2 threads (1 core) - DolphinDB 4.786s, ClickHouse 4.297s
- 4 threads (2 cores) - ClickHouse 2.965s
Computation only
- 1 thread - DolphinDB 7.106s, ClickHouse 5.564s
- 2 threads (1 core) - DolphinDB 3.966s, ClickHouse 3.668s
- 4 threads (2 cores) - ClickHouse 2.573s
Original
Note this doesn't answer the specific question as it relates to DolphinDB - but here's a version using ClickHouse as well.
WITH
60 AS t1,
50 AS t10,
30 AS t20,
20 AS t2,
([if(s < t10, 0, NULL), if(s > t20, 0, NULL)], [if(s > t1, 1, NULL), if(s < t2, 1, NULL)]) AS signal
SELECT arrayFill(x -> (x != -2), groupArray(ifNull(coalesce((signal.1)[1], (signal.2)[1]) - coalesce((signal.1)[2], (signal.2)[2]), -2))) AS k
FROM
(
SELECT arrayJoin([10, 20, 70, 59, 42, 49, 19, 25, 26, 35]) AS s
)
FORMAT TSV
[-1,-1,1,1,0,0,-1,-1,-1,0]
Benchmarking 10 million random samples. Fastest DolphinDB answer by Summer.H on my system, best run out of 5:
DolphinDB (4 threads)
./dolphindb
DolphinDB Systems 0.99.0 64 bit Copyright (c) 2011~2019 DolphinDB, Inc. Licensed to Trial User. Expires on 2019.12.31 (Build:2019.10.25)
>timer t1= 60
timer t10 = 50
timer t20 = 30
timer t2 = 20
timer signal = rand(100.0, 10000000)
timer direction = (iif(signal >t1, 1h, iif(signal < t10, 0h, 00h)) - iif(signal <t2, 1h, iif(signal > t20, 0h, 00h))).ffill().nullFill(0h)
;>>>>>>
Time elapsed: 0.01 ms
Time elapsed: 0.001 ms
Time elapsed: 0.001 ms
Time elapsed: 0.001 ms
Time elapsed: 72.675 ms
Time elapsed: 305.442 ms
Total time: 378 ms
ClickHouse (restricted 4 threads like DolphinDB), best run out of 5:
CREATE TEMPORARY TABLE dtest2 AS
WITH
rand()%100 + rand()/4294967295 AS s,
60 AS t1,
50 AS t10,
30 AS t20,
20 AS t2,
([if(s < t10, 0, NULL), if(s > t20, 0, NULL)], [if(s > t1, 1, NULL), if(s < t2, 1, NULL)]) AS signal
SELECT arrayFill(x -> (x != -2), groupArray(ifNull(coalesce((signal.1)[1], (signal.2)[1]) - coalesce((signal.1)[2], (signal.2)[2]), -2))) AS k
FROM numbers_mt(10000000)
Ok.
0 rows in set. Elapsed: 0.300 sec. Processed 10.00 million rows, 80.00 MB (33.37 million rows/s., 266.94 MB/s.)
Total time 300 ms
ClickHouse, default configuration/no thread limit, best run out of 5:
CREATE TEMPORARY TABLE dtest2 AS
WITH
rand()%100 + rand()/4294967295 AS s,
60 AS t1,
50 AS t10,
30 AS t20,
20 AS t2,
([if(s < t10, 0, NULL), if(s > t20, 0, NULL)], [if(s > t1, 1, NULL), if(s < t2, 1, NULL)]) AS signal
SELECT arrayFill(x -> (x != -2), groupArray(ifNull(coalesce((signal.1)[1], (signal.2)[1]) - coalesce((signal.1)[2], (signal.2)[2]), -2))) AS k
FROM numbers_mt(10000000)
Ok.
0 rows in set. Elapsed: 0.191 sec. Processed 10.00 million rows, 80.00 MB (52.22 million rows/s., 417.74 MB/s.)
Final timings from each:
kdb (800ms), DolphinDB (480ms, 378ms, 330ms?), ClickHouse (191ms)
I further benchmarked here with 250 million random signals - DolphinDB took 9214.99ms (1420ms random signal generation + 7794.24ms calculation). ClickHouse took 4272 ms total for random generation and calculation.
Was limited with my DolphinDB licence but ClickHouse managed 1 billion signals in 17.4 seconds (memory table) or 20.1 seconds (to disk).