I am using Impala JDBC driver to batch insert data into Impala. I currently have a batch size of 1000 and using INSERT INTO VALUES clause by PreparedStatement to execute batch queries. Impala Daemon in running on 3 machines and Impala Catalog Server, State store are running on the 4th machine.
The batch insert query plan on Impala looks like this:
Query Timeline: 35s398ms
- Query submitted: 0.000ns (0.000ns)
- Planning finished: 34s822ms (34s822ms)
- Submit for admission: 34s886ms (63.997ms)
- Completed admission: 34s886ms (0.000ns)
- Ready to start 1 fragment instances: 34s886ms (0.000ns)
- All 1 fragment instances started: 34s958ms (71.997ms)
- DML data written: 35s082ms (123.996ms)
- DML Metastore update finished: 35s286ms (203.993ms)
- Request finished: 35s298ms (11.999ms)
- Unregister query: 35s374ms (75.997ms)
- ComputeScanRangeAssignmentTimer: 0.000ns
As we can see, Planning finished is taking all the time. We have tried creating in both formats, PARQUET as well as normal. But everytime the Planning finished part is too high.
Is there any configuration change I need to do? Or am I doing something wrong?