UPDATE mutation hangs

Question

How can I ensure the mutations I submit to CH actually run?

I have a facts table lets call it Table A. It contains 8'904'390 records.

I also have a metadata table, Table B. It contains 14'790'739 records.

I am interested in joining them to make a faster table. Therefore I create a new table, A + B. It has selected columns from A and selected columns from B.

When I populate the new table with historical data, I run out of memory. Executing a simple INSERT INTO statement similar to this one:

INSERT INTO A+B SELECT
       id,
       time,
       price,
       buyer.gender as buyer_gender
FROM
    (
      SELECT
            id,
            time,
            buyer_ref,
            price
        FROM A
        )
        LEFT JOIN
    (
    SELECT
        ref,
        gender
        FROM B
        ) AS buyer ON buyer_ref = ref

So instead of manually partition the data to insert, I found this clever solution:
Updating rows with values from another table in ClickHouse

The idea is to first populate the A+B table with only data from A, then update it with data from B.

I tried running

ALTER TABLE A+B UPDATE buyer_gender=joinGet('buyer_join', 'gender', buyer_ref) where buyer_gender=''

The query executed and a record was added to the system.mutations table. I went home and came back 24 hours later and it had not finished, is_done was still 0 and the A+B table did not have any data from B in the buyer_gender column.

The clickhouse.logs are pretty quiet:

2021.02.12 11:16:03.631674 [ 153 ] {0e304bff-8563-4219-baf9-27294a84feac} <Debug> executeQuery: (from 172.22.0.1:52862) ALTER TABLE A+B UPDATE buyer_gender=joinGet('buyer_join', 'gender', buyer_id) where gender=''
2021.02.12 11:16:03.641182 [ 153 ] {0e304bff-8563-4219-baf9-27294a84feac} <Information> db_name.A+B: Added mutation: mutation_216.txt
2021.02.12 11:16:03.641210 [ 153 ] {0e304bff-8563-4219-baf9-27294a84feac} <Information> db_name.A+B: Waiting mutation: mutation_216.txt
2021.02.12 11:17:35.873119 [ 154 ] {14b270a0-281a-4d0a-b3be-4ce5059d853d} <Debug> executeQuery: (from 172.22.0.1:36140) select * from system.mutations
2021.02.12 11:17:35.873626 [ 154 ] {14b270a0-281a-4d0a-b3be-4ce5059d853d} <Trace> ContextAccess (default): Access granted: SELECT(database, table, mutation_id, command, create_time, `block_numbers.partition_id`, `block_numbers.number`, parts_to_do_names, parts_to_do, is_done, latest_failed_part, latest_fail_time, latest_fail_reason) ON system.mutations
2021.02.12 11:17:39.557393 [ 154 ] {fdabc8d1-488c-4b47-a35e-afd80f288fa6} <Debug> executeQuery: (from 172.22.0.1:36140) select * from system.mutations Format Vertical
2021.02.12 11:17:39.557868 [ 154 ] {fdabc8d1-488c-4b47-a35e-afd80f288fa6} <Trace> ContextAccess (default): Access granted: SELECT(database, table, mutation_id, command, create_time, `block_numbers.partition_id`, `block_numbers.number`, parts_to_do_names, parts_to_do, is_done, latest_failed_part, latest_fail_time, latest_fail_reason) ON system.mutations

And the clickhouse.error.logs contain nothing related to this mutation.

I tried only selecting from a query on table A with one column from the join table using the joinGet above. It takes around 10s. I tried an ALTER TABLE UPDATE statement where I assign the buyer_gender column value to a constant string 'foo'. It also executes very quickly. Neither operation seems to use significant memory. But combining the joinGet with the ALTER TABLE UPDATE seems to never finish and I have no idea if it is even started at all.

So my question is how does one trigger mutations? If that is not possible, is over 24 hours normal for this operation and volume of data?

ClickHouse server version 20.6.5 revision 54436

This: `SELECT ... GROUP BY id` looks wrong. An ID in a table should uniquely identify a row. So `GROUP BY id` should be completely superfluous. This: `SELECT id,... FROM A group by toYYYYMM(time)` looks even wronger :-) You put several rows (each with an ID) into a group, so how could you select one ID? Which one do you want to select for a group? The highest? The lowest? As a rule of thumb: When you have `GROUP BY`, but no aggregation function (`MIN`, `COUNT`, `SUM`...), then you are doing it wrong. — Thorsten Kettner, Feb 12 '21 at 13:08
A `RIGHT JOIN` is something we don't use. We use `LEFT JOIN` instead (with reversed tables) in order to keep our queries readable. — Thorsten Kettner, Feb 12 '21 at 13:11
`ON buyer_id = id`? Both derived tables (subqueries) return an ID. Which do you mean? Qualify all columns with their table when working with more than one table. — Thorsten Kettner, Feb 12 '21 at 13:12
I know all the above comments don't answer your question. This is why they are comments and not an answer :-) I don't know clickhouse and I don't know what mutations in clickhouse are. But I think you should first get your query straight before doing the next step. — Thorsten Kettner, Feb 12 '21 at 13:25
Your first 2 comments are valid points. I omitted the aggregates for brevity. group by time was an experiment I did at later stages. The group by id was because the table is a snapshot table & we are interested in the latest record. However, I don't agree with the join keys being ambiguous, so I kept them as is. I've updated the question accordingly. — Jarvis, Feb 12 '21 at 13:30
Well, there is an ID in A and an ID in B. There is no alias for your first subquery. If you aliased it AA, `ON buyer_id = id` could mean `ON AA.buyer_id = AA.id` or `ON AA.buyer_id = buyer.id`. Or maybe you have just mutilated your query when trying to simplify it for us. As mentioned, I cannot answer your question. Hope someone else can. Good luck. — Thorsten Kettner, Feb 12 '21 at 13:44

score 0 · Answer 1 · answered Feb 17 '21 at 11:53

I ended up stepping back and using the Join table engine directly into the insert A + B table. That way we never need to run the update statement that resulted in a "slow" mutation.

For some reason, CH manages its memory a lot better when using the Join table engine instead of a good old LEFT OUTER JOIN.

Does not answer why the Mutation hangs though.

UPDATE mutation hangs

1 Answers1