0

Can I be sure that the last inserted data is available in a distrubuted table (based on tables from *ReplicatedMergeTree family) after clickhouse return control after an INSERT request or NOT?

Say, I have a local table like the next one:

CREATE TABLE IF NOT EXISTS my_namespace.local_table ON CLUSTER my_cluster
(
    `field1` UInt32,
    `field2` String,
    `timestamp` DateTime,
    `array_field` Array(String)
)
ENGINE = ReplicatedMergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY field
SETTINGS index_granularity = 128

And a distributed table which is used as a source for SELECT (connected to the local above):

CREATE TABLE IF NOT EXISTS 
    my_namespace.distributed_table 
ON CLUSTER 
    my_cluster 
AS 
    my_namespace.distributed_table
ENGINE = Distributed('my_claster', 'my_namespace', 'distributed_table', sipHash64(field1))

After inserting some amount of data to the distributed table (my_namespace.distributed_table ) I'd like to remove some rows with ALTER and a subquery with SELECT:

ALTER TABLE 
    my_namespace.distributed_table
DELETE WHERE 
    concat(toString(field1), toString(timestamp)) not in 
    (
        SELECT 
            concat(toString(field1), toString(max(timestamp))) 
        FROM 
            my_namespace.distributed_table
                GROUP BY
                    field1
    )

I suppose the inserted rows will be available when DELETING.

Can it be inserted at some replica(s) only and can it be not available immediately for SELECT queries like the query above?

sergzach
  • 6,578
  • 7
  • 46
  • 84

0 Answers0