-1

I am using Pentaho data integration (PDI)-spoon to create ETL's and I am very focused on performance. I develop an ETL to process that copy of 2,500,000 rows (each row has 104 columns) from MySQL 8 to Clickhouse database and it takes 30 min. Destination table does not have any indexes and constraints and it is a columnar database.

I am using linux ubuntu 22.04 and transformation running on pentaho server through spoon.sh

How to increase the transformation input/output speed?

I am using only 4 steps:- Truncate table by using EXECUTE SQL SCRIPT --> Fetch data by using TABLE INPUT--> Changing date formats by using SELECT VALUES ---> insert data into destination table by using TABLE OUTPUT.

I want to increase the I/O speed of the PDI-Spoon transformation

sai
  • 1
  • 2

1 Answers1

1

If it's a point-in-time migration, uou can try using the MySQL Table engine in ClickHouse in conjunction with INSERT FROM SELECT syntax to migrate your data directly from your ClickHouse instance, it should be faster.

https://clickhouse.cloud/integrations/mysql https://clickhouse.com/docs/en/engines/table-engines/integrations/mysql

  • Agreed. You could probably use ClickHouse functions to perform the ETL and I would guess it would still be faster than this PDI-spoon tool. – Rich Raposa Apr 06 '23 at 13:48
  • Thanks Ryadh Dahimene for responding, Actually the Above ETL is Under Schedule Daily it will Truncate the clickhouse table and Fetch the fresh data from Mysql Database and insert into clickhouse table – sai Apr 07 '23 at 10:12