Data partitioning by columns

Question

I have a this big matrix of 50 rows and 1.5M columns. From these 1.5M columns, the first two are my headers.

I am trying to divide my data by columns into small pieces. So for example each small set will be 50 lines and 100 columns. But each small data must have the first two columns mentioned above as the headers.

I tried

awk '{print $1"\t"$2"\t"}' test | cut -f 3-10
awk '{print $1"\t"$2"\t"}' test | cut -f 11-20
...

or

cut -f 1-2 | cut -f 3-10 test
cut -f 1-2 | cut -f 11-20 test
...

but none of the above is working.

Is there an efficient way of doing this?

what software in its right mind would output 1.5M columns (do you mean M as in Million? or M as in the Roman numeral for 1000?) (Either way its crazy, just different orders of magnitude ;-) ). Can't you get the data delivered the other way around, 50 columns, by 1.5M rows? Good luck! — shellter, Jul 22 '13 at 03:20

score 0 · Answer 1 · answered Jul 21 '13 at 21:13

One way with awk. I don't know if it (awk) can handle such a big number of columns, but give it a try. It uses modulus operator to cut line each a specific number of columns.

awk '{
        ## Print header of first line.
        printf "%s%s%s%s", $1, FS, $2, FS
        ## Count number of columns printed, from 0 to 100.
        count = 0
        ## Traverse every columns but the first two keys.
        for ( i = 3; i <= NF; i++ ) {
            ## Print header again when counted 100 columns.
            if ( count != 0 && count % 100 == 0 ) {
                printf "%s%s%s%s%s", ORS, $1, FS, $2, FS
            }
            ## Print current column and count it.
            printf "%s%s", $i, FS
            ++count
        }
        ## Separator between splits.
        print ORS
    }
' infile

I've tested it with two lines and 4 columns instead of 100. Here is the test file:

key1 key2 one two three four five six seven eight nine ten
key1 key2 one2 two2 three2 four2 five2 six2 seven2 eight2 nine2 ten2

And results in:

key1 key2 one two three four 
key1 key2 five six seven eight 
key1 key2 nine ten 

key1 key2 one2 two2 three2 four2 
key1 key2 five2 six2 seven2 eight2 
key1 key2 nine2 ten2

Data partitioning by columns

1 Answers1