1

I have a simple task of copying Excel data to SQL tables. I am executing one stored procedure initially to delete tables entries. Then I have Excel input from which I am copying data to the SQL tables using tMap.

I have 20 tables to copy data to. I have relatively small number of table entries (10-100) to copy. Still when I am executing my task, it takes a very long time (5-10 mins) and after copying 12 tables entries its running out of memory.

My work flow is.. (stored procedure ->(on subjob ok) -> excel input -> tmap -> tMSSqlOutput -> (on component ok) -> excel input -> tmap -> tMSSqlOutput (on component ok) - > ...... -> excel input -> tmap -> tMSSqlOutput)

My Excel sheet is on my local machine where as I am copying data to SQL tables on a server. I have kept my run/debug settings as Xms 1024M, Xmx 8192m. But still its not working.

May I know what can I do to solve this issue?

I am running my talend on a VM (Virtual Machine). I have attached the screenshot of my job.

enter image description here

Quick-gun Morgan
  • 338
  • 12
  • 31

3 Answers3

2

Use onSubJobOK on the excelInput to connect to the next ExcelInput. This would change the whole codegeneration.

The Generated code is a function for every subjob. The difference in code generation between onSubJob and onComponentOk is that OnComponent ok will call the next function, while OnSubJobOk waits for the current subjob/function to finish. The latter let the Garbage Collerctor function better.

If that doesn't solve the problem create subjobs which contain 1 excel-DBoutput. Then link these jobs with OnSubjobOK in a master job.

Balazs Gunics
  • 2,017
  • 2
  • 17
  • 24
  • 1
    I was just typing my answer to this too :) The main point holds true, that these subjobs should be connected on subjob ok but there's also a "nuclear option" to use completely different processes which will spawn new JVM instances. – ydaetskcoR Jun 02 '14 at 14:17
  • Using on subjob ok have fixed the running time issues. Been able to copy 1500 rows in 2 minutes. But I wanted to know whether it will fix my memory issues as well or not? – Quick-gun Morgan Jun 02 '14 at 14:25
2

You should be running all of these separate steps in separate subjobs, using "on subjob ok" to link them, so that the Java garbage collector can better reallocate memory between steps.

If this still doesn't work you could separate them into completely separate jobs and link them all using tRunJob components and make sure to select to tick "Use an independent process to run subjob":

Use an independent process to run subjob option in tRunJob

This will spawn a completely new JVM instance for the process and thus not be memory tied by the JVM. That said, you should be careful not to spawn too many JVM instances as there will be some overhead in the start up of the JVM and obviously you are still limited by any physical memory constraints.

It belongs in a separate question really but you may also find some benefit to using parallelisation in your job to improve performance.

ydaetskcoR
  • 53,225
  • 8
  • 158
  • 177
  • Using on subjob ok have fixed the running time issues. Been able to copy 1500 rows in 2 minutes. But I wanted to know whether it will fix my memory issues as well or not? – Quick-gun Morgan Jun 02 '14 at 14:23
  • The only way to know is to wait and see if it runs out of memory but I'd expect it to handle it fine. The problem you had before was that the JVM was holding all of the data for all of your Excel tables in memory and not removing them because it was expecting you to go back and access them at a later point. As this was not the case you should be explicitly using an on subjob OK link. You should only use an on component OK link when you explicitly want to follow up something such as with a database commit. – ydaetskcoR Jun 02 '14 at 14:37
  • All Right. Thank you for your valuable suggestion. – Quick-gun Morgan Jun 02 '14 at 15:15
-1

To avoid consuming too much memory by the job (outOfMemory), you can store large transformed data in your tmap in a temporary directory on the disk.

This printscreen shows how to do that.

enter image description here

amgohan
  • 1,358
  • 3
  • 14
  • 24
  • This wasn't a tMap problem. Also its not enough to set a path there but you need to enable the storage for each of the lookups. – Balazs Gunics Jun 04 '14 at 14:59