3

(Talend OS for Data Integration)

Is it possible to use a dynamic table name? I've researched all of the resources within the user manual and, whilst none of the them seem to expressly rule it out, I can't seem to get it to work.

Scenario

Say you had a huge list of 1,000,000 names - Dave Smith, Dave Jones, Dave Bloggs etc. These are all stored in one table.

You need to split these into tables which reflect the first name, so you would have a DB with tables names such as 'Andy', 'Adrian', 'Adam' etc.

Where I'm at

What I can do, without issue, is batch export a full dataset into a table. I can also iterate through the field to create my individual tables using a tMysqlRow (this is much quicker on its own).

I then attempted to select and spit out the data into its respective fields, but came across these issues:

  • tMySqlRow can be used with variables, so I can insert the data into a relevant table, but it's so slow it is redundant. It operates at around 2000 rows/sec, and there are some 20 million to do.
  • tMySqlOutput is much quicker, and can do this quite well, but needs to have the table name clearly defined. Using a context variable as the table name doesn't work (reads as 'Null').

It was suggested to look at the ETL components, but their definitions clearly suggest that the table name has to be defined in quotation marks.

It seems really odd for Talend to provide so much use of context variables, but seemingly won't allow you to split one data set into multiple tables dynamically. Is it a product limitation, or just me?

Thanks in advance.

Edit - added screenshots

context variable available in selection The context is printing to the window in this screenshot ('AB' was the test context value used) to show that this value was relevant until it was declared in the 'Table' field, but that then returns a null value

context variable prints to console okay, but is ignored as the table value

Chris J
  • 1,441
  • 9
  • 19
  • Can you provide the screenshots where are you setting the context value for table name and how are you reading it in the tMySqlOutput component? – Naga Pradeep Sep 22 '16 at 04:33
  • Maybe the question to ask first is: Why create tables based on names? Why not using partitions or good indexes? Joining over all tables will become messy quickly... – tobi6 Sep 22 '16 at 08:28
  • tobi6 - the names is just an easier way of explaining the context of what I'm doing. I need to match the destination schema, which is unfortunately split using codes from the primary keys of a master set of data. – Chris J Sep 22 '16 at 08:41
  • Did you use single quotes for your table name ('AB' in your post) ? This may cause some problems if you did. – Corentin Sep 22 '16 at 13:06
  • No, the tablename was selected as per the first screenshot. Has anyone successfully managed to use a context variable as a table name? – Chris J Sep 22 '16 at 14:12
  • I'd just create a staging area, dump the whole thing into the database, then generate as many insert into select * where name as many distinct names (tables) you have. – Balazs Gunics Sep 23 '16 at 13:16
  • This had been considered earlier, but each query was taking over 40 seconds each just to perform a single 'SELECT FROM WHERE' query. As it's over 20 million records, there's quite a big impact on efficiency. The original data is in CSV, so it's quite quick to load and split without directing through a staging table – Chris J Sep 23 '16 at 13:27

2 Answers2

3

Just to add to this, I have managed to get this to work.

enter image description here

  1. tMysqlInput_1 retrieves the table names from information_schema that are created in a subjob
  2. These are then passes to tHashOutput_1.

  3. tHashInput_1 then feeds these to a tFlowtoIterate component, which iterates through my input job.

  4. I can then access each table name by calling (String)globalMap.get("row6.TABLE_NAME")), as my tHash schema has one column which I have named 'Table_Name'.

I have a few actions which take place whereby I use the Table_Name in a filter to extract only the fields which match the table name, then passing it to my yMysqlOutput component, which looks like this:

enter image description here

I've tested it and it's all good. Hopefully this might save someone a LOT of time figuring out this in future (or 5 minutes to someone who is much better with Talend than I am!).

Thanks!

Chris J
  • 1,441
  • 9
  • 19
1

Chris,

If you can use tELTMysqlMap combined with a tELTMysqlOutput component, you should be able to achieve that.

The tELTMysqlOutput has a "Use different table name" Check Box where you can specify the Table Name as a context variable or an input field from the ELTMysqlMap.

Note:- I haven't tried it out yet.

  • Thanks for the help. I had explored tELTMysqlOutput, but kept with the tMysqlOutput component and managed to avoid using context variables. – Chris J Sep 23 '16 at 13:30