0

I have one file (.tsv) that contain variants calling for all the samples. I would like to merge the first three columns into one column:

Example: Original:

file name= variants.tsv > the first three columns that I want to merge are:

lane sampleID Barcode

B31 00-00-NNA-0000 0000

Desired output:

ID

B31_00-00-NNA-0000_0000

what are the recommended methods?

aborruso
  • 4,938
  • 3
  • 23
  • 40
Alhu.A
  • 31
  • 1

2 Answers2

0

One way, with a perl one-liner:

perl -F'\t' -lane '
    if ($. == 1) {
        print join("\t", "ID", @F[3..$#F])
    } else {
        print join("\t", join("_", @F[0,1,2]), @F[3..$#F])
    }' variants tsv

Splits each line into an array (@F) on tabs, and prints out the header and later lines using slices of that array to extract the appropriate elements, which are then joined into delimited strings.

Shawn
  • 47,241
  • 3
  • 26
  • 60
0

Starting from this

lane    sampleID    Barcode
B31 00-00-NNA-0000  0000

and using Miller, you can run

mlr --tsv put -S '$ID=$lane."_".$sampleID."_".$Barcode' input.tsv >output.tsv

to have

+------+----------------+---------+-------------------------+
| lane | sampleID       | Barcode | ID                      |
+------+----------------+---------+-------------------------+
| B31  | 00-00-NNA-0000 | 0000    | B31_00-00-NNA-0000_0000 |
+------+----------------+---------+-------------------------+

If you want only the ID field the command is

mlr --tsv put -S '$ID=$lane."_".$sampleID."_".$Barcode' then cut -f ID input.tsv >output.tsv
aborruso
  • 4,938
  • 3
  • 23
  • 40