We have a modest clickhouse cluster, ~30 nodes, and want to collect usage stats on it. We are hoping to do this using scheduled queries against the system tables, but using a normal query only get information on the one node you happen to be connected to, and creating a distributed table only works with the *log system tables. We can loop over the nodes, but don't want to do that. Is there a way to get all the instances of a system table, such as system.parts, in one query?
Asked
Active
Viewed 1,724 times
3 Answers
5
Distributed tables works with any type of tables and clusterAllReplicas as well.
create table test on cluster replicated as system.processes Engine=Distributed(replicated, system, processes);
SELECT
FQDN(),
elapsed
FROM test
┌─FQDN()────────────────────┬────elapsed─┐
│ hos.mycmdb.net │ 0.00063795 │
└───────────────────────────┴────────────┘
SELECT
FQDN(),
elapsed
FROM clusterAllReplicas(replicated, system, sessions);
SELECT elapsed
FROM clusterAllReplicas(replicated, system, processes)
┌─────elapsed─┐
│ 0.005636027 │
└─────────────┘
┌─────elapsed─┐
│ 0.000228303 │
└─────────────┘
┌─────elapsed─┐
│ 0.000275745 │
└─────────────┘
┌─────elapsed─┐
│ 0.000311621 │
└─────────────┘
┌─────elapsed─┐
│ 0.000270791 │
└─────────────┘
┌─────elapsed─┐
│ 0.000288045 │
└─────────────┘
┌─────elapsed─┐
│ 0.001048277 │
└─────────────┘
┌─────elapsed─┐
│ 0.000256203 │
└─────────────┘

Denny Crane
- 11,574
- 2
- 19
- 30
-
When we tried this solution, we started getting errors like this one:
default.parts.DirectoryMonitor: Code: 48, e.displayText() = DB::Exception: Received from – Richard Rymer May 10 '21 at 18:49. DB::Exception: Method write is not supported by storage SystemParts. Any thoughts on what we might be doing wrong? -
@RichardRymer you cannot create Mat.View witch inserts or reads system.parts. Seems you can achieve your goal simply by enabling system.part_log – Denny Crane May 10 '21 at 20:04
4
It can be used remote or remoteSecure functions that support multiple addresses:
SELECT
hostName() AS host,
any(partition),
count()
FROM remote('node{01..30}-west.contoso.com', system, parts)
GROUP BY host
/*
┌─host──────────┬─any(partition)─┬─count()─┐
│ node01-west │ 202012 │ 733 │
..
│ node30-west │ 202012 │ 687 │
└───────────────┴────────────────┴─────────┘
*/

vladimir
- 13,428
- 2
- 44
- 70
-
Interesting. I will have to experiment with this to see if it works on Superset. – Richard Rymer May 06 '21 at 16:43
-
-
-
So this ended up being the most stable solution for us, though it involves a little more maintenance for scheduled queries. – Richard Rymer May 10 '21 at 18:43
1
For the record, we ended up using materialized views:
CREATE MATERIALIZED VIEW _tmp.parts on cluster main_cluster
engine = Distributed('main_cluster', 'system', 'parts', rand())
AS select * from system.parts

Richard Rymer
- 133
- 14
-
Though this worked, we are pretty sure it caused a bunch of performance issues, namely insert latency went through the roof. – Richard Rymer May 10 '21 at 18:42