Skip first line in csv read on Kettle

Question

Hello i am trying to skip the first line of a csv file when i import it to Kettle Pentaho PDI 8.1.0.

The first line has the separator declaration

sep=;

The second line has the Headers. Cause of the first line the get fields button read only two variables. The first is the sep= and the second one that does not set a name.

I tried to set that header lines are 2 ,also to escape sep= also to use the Document header lines set to 1 in order to escape the first line but the get fields button does not recognize the headers.

Is there any other idea?

score 1 · Answer 1 · answered Jul 13 '18 at 13:43

1

Get fields will always look at the first line. You will need to enter the field list by hand.

You were on the right track, set headers to 2 and it will read the data correctly.

If you need to parse the separator declaration you will need to parse the file once to determine its structure, then use metadata injection to read a 2nd time for the data.

answered Jul 13 '18 at 13:43

nsousa

4,448
1
10
15

Thank you this is what i have done. I set the fields manually and set 2 line header and it work. How can i parse the csv file to get the metadata if this is only for the separator no need but if i can get headers it is interesting? I am new to Kettle in general – kyrpav Jul 13 '18 at 13:57
you'd read the file twice. The first read you focus only on row 2, which has the headers, read it as a single field (put some non existent character as separator), then split the line to rows by the known separator to get a list of fields. The general approach is explained in this blog post: http://ubiquis.co.uk/pdi/loading-csv-files-with-pdi-metadata-injection/ – nsousa Jul 17 '18 at 09:23

Skip first line in csv read on Kettle

1 Answers1