0

I need to read a .vcf.gz file from pentaho. I can read it from "Text file input" in "Content" tab setting "compressed" to "GZ".

-First of all i need to skip the headers ( basically every row with # at begin).

-Second i need to insert a new column where at every row i insert the file name.

E.g.

My file is:

#header
#header
#header
# chr pos ref alt
  chr1 3   A   A

What I want is:

chr1 3 A A id_001 (Taken readeing file name)

How can I achieve this?

CLAbeel
  • 1,078
  • 14
  • 20
xCloudx8
  • 681
  • 8
  • 21

1 Answers1

1

If you've found the Content tab, you must see the Header checkbox. You can specify the number of lines to skip.

enter image description here

As for the filename, the "Additional output fields" tab is what you need.

enter image description here

Here's the preview of output:

enter image description here

If you need to remove the file extension from the filename, there are a few ways to do that.

CLAbeel
  • 1,078
  • 14
  • 20
  • Uhm there's something i'm doing wrong. I've set the files path in order to get the files i need. I've changed the header option as you showed me, accordingly to my needs. I've added the last column but it shows me only the last column. Should i change something in "tab "Fields"? – xCloudx8 Oct 20 '16 at 12:01
  • Oh yes, you need some fields. Do you have anything there? – CLAbeel Oct 20 '16 at 12:11
  • Ok great, i've solved the problem of fields. How can i manage the problem with filename? My names are like this: 001.genome.vcf.gz i want to delete every thing after the first . so ibtaining only 001. Is there other options? – xCloudx8 Oct 20 '16 at 12:14
  • That really depends on what the requirements are. If you will always need to just get rid of ".genome.vcf.gz" then the easiest way to do that is with the "Replace in string" step. – CLAbeel Oct 20 '16 at 12:16
  • If what comes after the first . is going to be changing, but your filename itself is always three characters long, then you can use the "Strings cut" step. If it's more complicated than that, then you will probably have to use a tiny java script. – CLAbeel Oct 20 '16 at 12:18
  • Thank you so much i made it! – xCloudx8 Oct 20 '16 at 12:30