0

I have simple transformation consisting of 2 steps. 1 step (Input table) makes query to DB and 2 step (Java class) processes results. 2 step takes much time (it is normal in my case) but after 1 hours I get error of closed results set

Server has closed the connection. If result set contain huge amount of data, Server expects client to read off the result set relatively fast. In this case, please consider increasing net_wait_timeout session variable. / processing your result set faster (check Streaming result sets documentation for more information) 2017/10/02 13:12:06 - Getting of data cells .0 -

I think there should be some intermediate step (or some other option) to get relatively fast all result from 1 step. Could you help me with that?

palandlom
  • 529
  • 6
  • 17
  • I have a (not so) silly question: is it really due to the java class step? I mean, the `Input table` is often locked for other reasons. Can you replace the Step 2 by a `Dumy` step and see if it still locks. – AlainD Oct 10 '17 at 07:36
  • An other (not so) silly question: may your java class lock the database? Does it use any `JDBC`? – AlainD Oct 10 '17 at 07:38
  • Yes it uses - (java class in some cases can send UPDATE queries to DB). So could that lead to connection (and corresponding resultset) closing for 1 step? – palandlom Oct 10 '17 at 07:54

1 Answers1

1

I guess your step 2 is locking the same table as the one in step 1.

That's one of the drawback of the otherwise efficient architecture of the PDI. All the steps startup at the same time, and the quickest to produce results give the hand to the next steps. With this strategy of "do the quickest first", you sometimes beat the sql optimizer itself when there is lots of joins on sums or averages (pro rata).

The main pitfall in this respect is to read a table, make some transformation and rewrite the result on the same table with the truncate table checked. In that case, the truncate is done a few milliseconds before the select of the input table which starts an infinite dead lock. After a long time you decide to kill the ETL, but at that time the data has been lost.

Solutions:

  • The best practice is to rewrite step2 using PDI steps rather than to use a ready made java class. That is the way I strongly recommend on the long run, but you may have some reason not follow it.

  • If your table is small, you can put a blocking step between the input and output.

  • If you table is big, you can use a sort row step instead of the blocking step. You do not really want to sort, but the PDI needs to look at the last row to be sure the sort is complete, before to give results to the next step. The sort will cut the data in temporary chuncks on the hard disk, and you can have a certain control on where and how the tmp data is stored.

  • You can copy your table in a tmp table (or file), process and delete it after. Use a job to do that, because in a job, unlike in a transformation, the process is sequential.

AlainD
  • 6,187
  • 3
  • 17
  • 31
  • Thank, for detailed explanation! I hope that I understand it right - I removed code for UPDATE-query execution from java-class and add 3 step (INSERT/UPDATE step) which receives some field from 2 step and make UPDATE query. – palandlom Oct 12 '17 at 11:54
  • Congrats. Sincerely. Dead locks are never easy bugs. – AlainD Oct 12 '17 at 14:00