Split Flat File String into multiple columns in Informatica

Question

I have a flat file (csv) with 1 column (ColumnA) having data in the format ==> 100-000500-10-40000-000-200-600-000

I need it split into 8 columns (values adjacent to the hyphens) and concatenate the 2,4 and 5 splitted sub values. How do I do this in an expression transformation? I have tried the following but didnt help, table is loading these fields as blank.

Variable port 1: SUBSTR(ColumnA,3,6) Variable port 2: SUBSTR(ColumnA,12,5) Variable port 3: SUBSTR(ColumnA,18,3)

Then I have made output ports of each 3 of them V_ACCOUNT_SEG2_CODE, V_ACCOUNT_SEG4_CODE, V_ACCOUNT_SEG5_CODE and then concatenated like below:

Final Variable Port: RTRIM(V_ACCOUNT_SEG2_CODE||'-'||V_ACCOUNT_SEG4_CODE||'-'||V_ACCOUNT_SEG5_CODE,'-').

Can someone advice what is wrong with the code and what can be the correct working code? I am getting data from ColumnA (csv flat file) to my target table (Oracle) but these additional columns are coming as null.

My Expression Transformation Here

If Column A is the only one in the CSV, you can set the separator of the CSV to hyphens and Informatica will parse it for you in 8 columns. If it's not the only column, you could do it in 2 mappings : one to extract that single column in another CSV, and the second with hyphens as separator. Another solution would be a Shell command (assuming running on Unix) that changes hyphens to the separator of the file before the mapping starts : `tr '-' ','` — Mickaël Bucas, May 21 '20 at 17:10

Koushik Roy · Answer 1 · 2020-06-17T17:08:21.370

1

You need to use combination of SUBSTR and INSTR. You shouldn't use hard code values since they can change depending on your data. Probably that's what is happening in your mapping.
v_col1= SUBSTR (data_col,1,instr(data_col,'-')-1 ) v_col2 = SUBSTR (data_col,instr(data_col,'-')+1, instr(data_col,'-',1,2)-1 ) v_col3 = SUBSTR (data_col,instr(data_col,'-',1,2)+1,instr(data_col,'-',1,3)-1 ) ...
Alternately you can use REGEX as well like below. But issue is this works ONLY when you have a data with 3 '-' like 'abc-defg-hij'. Any other combination, split wont work.

v_col1:= REG_EXTRACT(data_col,'([^-]+)-+([^-])-+([^-]+)',1) v_col2:= REG_EXTRACT(data_col,'([^-]+)-+([^-])-+([^-]+)',2) v_col3:= REG_EXTRACT(data_col,'([^-]+)-+([^-])-+([^-]+)',3) ... Alternately you can load the file data into a temp oracle table and use regex_substr() to split them.

edited Jun 17 '20 at 17:08

answered May 19 '20 at 21:18

Koushik Roy

6,868
2
12
33

Hi Koushik, Thanks for the input. I used REGEX the way you said here but its only returning me the 1st occurence and not all the 8. Could you guide me what I am doing wrong. My data is suppose 100-0042*-*-16810-*-*-*-* and I am getting COL1 as 100 and not the other 7 numbers in the other 7 fields. For eg., REG_EXTRACT(GL_String,'([^-]*),?([^-]*),?([^-]*).*',2) is not giving me 0042 for this example but null - similarly other 7 fields apart from the first one are coming as nulls. – Orion997 Jun 03 '20 at 16:02
i will do some R&D and get back. – Koushik Roy Jun 07 '20 at 20:40
1

Thanks Koushik, so if REGEX wont work for more than 3 '-', can I use the SUBSTR/INSTR in Infa as you had suggested as well? In that case what will be the format for 8 values? Like you have given for the first 3? v_col1= SUBSTR (data_col,1,instr(data_col,'-')-1 ) v_col2 = SUBSTR (data_col,instr(data_col,'-')+1, instr(data_col,'-',1,2)-1 ) v_col3 = SUBSTR (data_col,instr(data_col,'-',1,2)+1,instr(data_col,'-',1,3)-1 ) – Orion997 Jun 18 '20 at 20:18
INSTR/SUBSTR will always work. Even if you have variable number of delimiters. Unavailable will be null. – Koushik Roy Jun 22 '20 at 12:25

score 0 · Answer 2 · answered May 22 '20 at 08:20

0

You can create the Source Transformation for the Flat Files and use the hyphen as delimiter. This will create the columns for you. All you'd need to do then is just concatenate the appropriate columns in Expression Transformation.

answered May 22 '20 at 08:20

Maciejg

3,088
1
17
30

score 0 · Answer 3 · answered Sep 05 '20 at 14:16

0

More efficient REGEX:

v_col1:= REG_EXTRACT(data_col,'([^-]+)-([^-])-([^-]+)',1)
v_col2:= REG_EXTRACT(data_col,'([^-]+)-([^-])-([^-]+)',2)
v_col3:= REG_EXTRACT(data_col,'([^-]+)-([^-])-([^-]+)',3)
...

answered Sep 05 '20 at 14:16

Isac R

1

1

Welcome to Stack Overflow. Consider adding an explanation why your regex is more efficient. Answers which explain themselves are better (also long term) because they help others to understand your thinking and your answer. – Ivo Mori Sep 06 '20 at 01:42

Split Flat File String into multiple columns in Informatica

3 Answers3

Linked