5

I am trying to transpose a really long file and I am concerned that it will not be transposed entirely.

My data looks something like this:

Thisisalongstring12345678   1   AB  abc 937 4.320194
Thisisalongstring12345678   1   AB  efg 549 0.767828
Thisisalongstring12345678   1   AB  hi  346 -4.903441
Thisisalongstring12345678   1   AB  jk  193 7.317946

I want my data to look like this:

Thisisalongstring12345678 Thisisalongstring12345678 Thisisalongstring12345678 Thisisalongstring12345678
1                         1                         1                         1
AB                        AB                        AB                        AB
abc                       efg                       hi                        jk
937                       549                       346                       193
4.320194                  0.767828                  -4.903441                 7.317946

Would the length of the first string prove to be an issue? My file is much longer than this approx 2000 lines long. Also is it possible to change the name of the first string to Thisis234, and then transpose?

user1269741
  • 437
  • 1
  • 5
  • 7
  • If you're willing to put up with lines of 20,000 * 25 characters (or so) per column (so 100 KiB or so per line), and the applications you work with are too, then the chances are that `gawk` will be fine with it too. Yes, you can trim the long names; devise the algorithm and apply on output or during input. – Jonathan Leffler Apr 04 '12 at 00:50

4 Answers4

7

I don't see why it will not be - unless you don't have enough memory. Try the below and see if you run into problems.

Input:

$ cat inf.txt 
a b c d
1 2 3 4
. , + -
A B C D

Awk program:

$ cat mkt.sh
awk '
{
  for(c = 1; c <= NF; c++) {
    a[c, NR] = $c
  }
  if(max_nf < NF) {
    max_nf = NF
  }
}
END {
  for(r = 1; r <= NR; r++) {
    for(c = 1; c <= max_nf; c++) {
      printf("%s ", a[r, c])
    }
    print ""
  }
}
' inf.txt

Run:

$ ./mkt.sh 
a 1 . A 
b 2 , B 
c 3 + C 
d 4 - D 

Credits:

Hope this helps.

icyrock.com
  • 27,952
  • 4
  • 66
  • 85
7

This can be done with the rs BSD command:

http://www.unix.com/man-page/freebsd/1/rs/

Check out the -T option.

Kaz
  • 55,781
  • 9
  • 100
  • 149
  • This is brilliant: also, available (stock) in OSX. rs as many features. I suggest reading the man page. – Vincent Mar 21 '15 at 17:43
4

I tried icyrock.com's answer, but found that I had to change:

for(r = 1; r <= NR; r++) {
  for(c = 1; c <= max_nf; c++) {

to

for(r = 1; r <= max_nf; r++) {
  for(c = 1; c <= NR; c++) {

to get the NR columns and max_nf rows. So icyrock's code becomes:

$ cat mkt.sh
awk '
{
  for(c = 1; c <= NF; c++) {
    a[c, NR] = $c
  }
  if(max_nf < NF) {
    max_nf = NF
  }
}
END {
  for(r = 1; r <= max_nf; r++) {
    for(c = 1; c <= NR; c++) {
      printf("%s ", a[r, c])
    }
    print ""
  }
}
' inf.txt

If you don't do that and use an asymmetrical input, like:

a b c d
1 2 3 4
. , + -

You get:

a 1 .
b 2 ,
c 3 +

i.e. still 3 rows and 4 columns (the last of which is blank).

ScubaFish
  • 41
  • 3
0

For @ ScubaFishi and @ icyrock code:

"if (max_nf < NF)" seems unnecessary. I deleted it, and the code works just fine.

JeffZheng
  • 1,277
  • 1
  • 10
  • 13