Can I be sure that the last inserted data is available in a distrubuted
table (based on tables from *ReplicatedMergeTree
family) after clickhouse
return control after an INSERT
request or NOT?
Say, I have a local table like the next one:
CREATE TABLE IF NOT EXISTS my_namespace.local_table ON CLUSTER my_cluster
(
`field1` UInt32,
`field2` String,
`timestamp` DateTime,
`array_field` Array(String)
)
ENGINE = ReplicatedMergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY field
SETTINGS index_granularity = 128
And a distributed table which is used as a source for SELECT
(connected to the local above):
CREATE TABLE IF NOT EXISTS
my_namespace.distributed_table
ON CLUSTER
my_cluster
AS
my_namespace.distributed_table
ENGINE = Distributed('my_claster', 'my_namespace', 'distributed_table', sipHash64(field1))
After inserting some amount of data to the distributed table (my_namespace.distributed_table
) I'd like to remove some rows
with ALTER
and a subquery
with SELECT
:
ALTER TABLE
my_namespace.distributed_table
DELETE WHERE
concat(toString(field1), toString(timestamp)) not in
(
SELECT
concat(toString(field1), toString(max(timestamp)))
FROM
my_namespace.distributed_table
GROUP BY
field1
)
I suppose the inserted rows
will be available when DELETING
.
Can it be inserted at some replica(s) only and can it be not available immediately for SELECT
queries like the query
above?