NebulaGraph massive data query timeout and future cluster planning consultation

Question

nebula version: 3.5
Deployment method: stand-alone
Installation method: RPM
Whether to use the production environment: N
hardware information Memory: 64G

specific description of the problem There are about 4 million pieces of data in one day. After importing three days of data, the results can be queried. However, after importing one month of data, the query cannot be found. Come higher, then try to reduce write_buffer_size and block_size, its memory usage is still very high.

question: 1: When the amount of data is very large, is there any way to speed up the operation of aggregation statistics and filtering like this? Is nebula suitable for this kind of query? Is there a problem with my usage? I hope to get advice from you guys 2: In the future, I hope to import and count half-year data. With such a large amount of data, can such statistical statements be calculated? How should the resource configuration of the cluster be estimated? 3: What is the reason for the high memory usage of storagd? Is there any way to bring him down a bit?

Check for phrases:

match_pid_tid = "match (s:securityop)-[ov:overring]->(o:`order`)-[re:refund]-(t:tender) " + \
                    "with id(s) as sid, collect(id(o)) as list_order_number, o.`order`.store_code as " + \
                    "store_code, t,count(re) as over_count, id(t) as tid, collect(distinct " + \
                    "o.`order`.business_day) as business_day " + \
                    "where over_count > " + str(min_over_count) + " " + \
                    "return sid, list_order_number, store_code, over_count, tid,business_day " + \
                    "order by over_count desc "

Data volume for the current month: enter image description here

Memory usage: enter image description here

graphd log: enter image description here

Storage configuration: enter image description here

NebulaGraph massive data query timeout and future cluster planning consultation

0 Answers0