2

I have a file that comes from R. It is basically the output of write.table command using as delimiter " ". An example of this file would look like this:

file1.txt
5285 II-3 II-2 2 NA NA NA NA 40 NA NA c.211A>G
8988 III-3 III-4 1 NA NA NA NA NA NA NA c.211A>G
8F412 III-3 III-4 2 NA NA 28 NA NA NA NA c.211A>G
4H644 III-3 III-4 2 NA NA NA NA NA NA NA NA

What I need to get is a new file in a very specific format, basically I need to align all the columns using spaces, I can't use tabs.

The desired output will be

5285   II-3   II-2  2 NA NA NA NA 40 NA NA c.211A>G
8988   III-3  III-4 1 NA NA NA NA NA NA NA c.211A>G
8F412  III-3  III-4 2 NA NA 28 NA NA NA NA c.211A>G
4H644  III-3  III-4 2 NA NA NA NA NA NA NA NA

Thus, between 5285 and II-3, first row, there would be 3 white spaces and between 8F412 and III-3, third row, only two white spaces. The lengths of first tree fields can be different, however the length for the rest of columns is always fixed (two characters) but the last one that can be 12 characters

I can do this in a text editor but I have a very huge file, and I would like to do it using bash, awk or R

codeforester
  • 39,467
  • 16
  • 112
  • 140
user2380782
  • 1,542
  • 4
  • 22
  • 60

3 Answers3

5

Use column:

$ column -t file
5285   II-3   II-2   2  NA  NA  NA  NA  40  NA  NA  c.211A>G
8988   III-3  III-4  1  NA  NA  NA  NA  NA  NA  NA  c.211A>G
8F412  III-3  III-4  2  NA  NA  28  NA  NA  NA  NA  c.211A>G
4H644  III-3  III-4  2  NA  NA  NA  NA  NA  NA  NA  NA
James Brown
  • 36,089
  • 7
  • 43
  • 59
3

here is another approach

$ tr ' ' '\t' <file | expand -t2

5285  II-3  II-2  2 NA  NA  NA  NA  40  NA  NA  c.211A>G
8988  III-3 III-4 1 NA  NA  NA  NA  NA  NA  NA  c.211A>G
8F412 III-3 III-4 2 NA  NA  28  NA  NA  NA  NA  c.211A>G
4H644 III-3 III-4 2 NA  NA  NA  NA  NA  NA  NA  NA
karakfa
  • 66,216
  • 7
  • 41
  • 56
2

Use awk so that you have tight control on how you want to format each field:

awk '{ printf("%-5s %-5s %-5s %s %s %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12) }' file

Produces:

5285  II-3  II-2  2 NA NA NA NA 40 NA NA c.211A>G
8988  III-3 III-4 1 NA NA NA NA NA NA NA c.211A>G
8F412 III-3 III-4 2 NA NA 28 NA NA NA NA c.211A>G
4H644 III-3 III-4 2 NA NA NA NA NA NA NA NA
codeforester
  • 39,467
  • 16
  • 112
  • 140