clickhouse cluster multiple record when insert into data

Question

I use ReplicatedMergeTree and Distributed table in clickhouse to make a HA cluster. And I think it should store two replicas in cluster,it will be ok when one of node has so problems. This is some of my configuration(config.xml): ...

        <logs>
        <shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>node1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>node2</host>
                <port>9000</port>
            </replica>
        </shard>
        <shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>node2</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>node3</host>
                <port>9000</port>
            </replica>
        </shard>
        <shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <replica>
                <host>node3</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>node1</host>
                <port>9000</port>
            </replica>
        </shard>
        </logs>
...
<!-- each node is different -->
<macros>
    <layer>01</layer>
    <shard>01</shard>
    <replica>node1</replica>
</macros>
<!-- below is node2 and node3 configuration 

<macros>
    <layer>02</layer>
    <shard>02</shard>
    <replica>node2</replica>
</macros>

<macros>
    <layer>03</layer>
    <shard>03</shard>
    <replica>node3</replica>
</macros>
-->
...

And then I create table in each node by clickhouse-client --host cmd:

create table if not exists game(uid Int32,kid Int32,level Int8,datetime Date) 
ENGINE = ReplicatedMergeTree('/clickhouse/data/{shard}/game','{replica}') 
PARTITION BY  toYYYYMMDD(datetime)  
ORDER BY (uid,datetime);

After create ReplicatedMergeTree table , I then create distribute table in each node (just for each node have this table, in fact it only create on one node)

CREATE TABLE game_all AS game  
ENGINE = Distributed(logs, default, game ,rand())

This is just ok now.And I also think it is ok when i insert data to game_all.But when I query data from game table and game_all table , I find it must be something wrong. Because I insert one record to game_all table ,but the result is 3 which it must be one ,and I query each game table ,just one table has 1 record.Finally I check each node's disk and it seems to have no replicas in this table ,Because just one node have some disk use over 4KB ,others have no disk use just 4KB.

I have resolve this problem:just modify the host bellow repica which (node1,node2),(node2,node3),(node3,node1) ; After modify it should be(node1,node2),(node3,node4),so I shoule add a node to cluster for 2 replica and 2 shard.Finally also should be modify , node1 layer is the same as node2 But node1's shard should diffrrent from node2's. The same as node3/node4 — DreamHeaven, Nov 05 '18 at 08:45
You can answer your own question and accept it but the problem here is that you can't have a replica of a table in same database. Since each node in your cluster contains 2 replicas both of them will be on same node therefore same database. That's why adding node4 resolved your problem. — ramazan polat, Jul 29 '19 at 19:54

clickhouse cluster multiple record when insert into data

0 Answers0