-3

I need help to improve the data writing performance in DolphinDB.

The client end receives stock quotation data, one at a time. Taking latency and throughput into consideration, how could I write data efficiently to a stream table or a dfs table? I need more suggestions to improve the data writing efficiency. Many thanks.

pjs
  • 18,696
  • 4
  • 27
  • 56
Polly
  • 603
  • 3
  • 13

1 Answers1

0

On how to improve DolphinDB data import performance:

  1. Increase your Internet speed. The data importing process in the distributed system involves numerous network transmission processes. It is recommended to deploy at least 10 Gigabit Ethernet to avoid high latency.
  2. Use bulk data import instead of insert. Insert is not recommended because inserting a single record involves all the processing procedures and will lead to high latency. Steps such as writing to the log, opening a transaction and multiple network transmissions are unavoidable consumption during processing, and only the insert step takes slightly less time than bulk import. That’s why bulk import is highly recommended. Currently, bulk import is supported in C++ , Python and C# APIs.
  3. Increase the number of remoteexecutor. The default value is 1.When a node needs to send data to other nodes after receiving data, remoteexecutor = 1 indicates that there is only one thread sending the data. Therefore, increase the number of remoteexecutor to realize multiple nodes sending data simultaneously.
  4. Adopt data compression. DolphinDB provides data compression automatically for relatively large volume of data. it will compress the data blocks so as to conserve network bandwidth.
  5. Partition data in advance on the client. Data node in dfs database will group the data to be imported based on the partitioning method before importing, which can be done in advance on the client to reduce the consumption.
  6. Import data in asynchronous batches. Conduct multi-threaded batch processing with at least two threads. One of the threads receives data and maintains a queue, and the other thread runs in loop to obtain data from the queue and write data.
Polly
  • 603
  • 3
  • 13