1

As I learn from DolphinDB help, parameter sortKeyMappingFunction of function createPartitionedTable is used to reduce the dimensionality of sort keys. I wonder when and why is the dimensionality reduction needed?

Polly
  • 603
  • 3
  • 13

1 Answers1

1

For example

Supposing 10 million devices are recorded and each device generates a record a day. The table is partitioned by date.

If a TSDB storage engine is used to create the database with the deviceId as the sortColumns, then 10 million sort keys are generated and only one record corresponds to a sort key.

DolphinDB, as a time-series database, has implemented the function of index in TSDB. However, it is deeply connected with data storage.

The files stored in DolphinDB are separated into blocks based on the sort key. Therefore, 10 million blocks are divided within a partition for the above example. As each block is maintained with fixed overhead, it is very inefficient to query the entire partition as 10 million blocks are read.

With sortKeyMappingFunction, you can regroup sort keys so as to reduce its dimensionality. For example:

login(`admin,`123456)
n=10000000
id = "d" + string(1..10000000)
TradeDate = take(2022.01.01,n)
val = rand(1000..3000, n)

schemaTable = table(id, TradeDate, val)

dbPath = "dfs://TSDB_DEMO"
if(existsDatabase(dbPath)){dropDatabase(dbPath)}
db_demo = database(dbPath, VALUE, 2022.01.01..2022.01.05, engine='TSDB')
demo = createPartitionedTable(dbHandle=db_demo, table=schemaTable, tableName="demo", partitionColumns=`TradeDate, sortColumns=`id`TradeDate, keepDuplicates=ALL, sortKeyMappingFunction=[hashBucket{,1000}]).append!(schemaTable)

The example applies sortKeyMappingFunction to id which is reduced from 10 million to 1000. There are 10 thousand records with the same sort key.

Shena
  • 341
  • 1
  • 5