-1

What I want to do is simply concatenate 2 files like the following example:

file 1        file 2
C1            O1             
C3            O3
..            O5
              O7
              O9
              O11
              O13
              O15
              O17
              O19
              ..

The desired out file is:

file 3
C1
O1
O9
O17
C3
O3
O11
O19
..
..

So, the patterns is: first C1 with O1, then 3 rows out in the file 2 (so, print O9); then another 3 rows out in file 2 (so, print O17). Then print C3 and O3, 3 rows out in file 2 (O10), 3 rows out (O18); then C5 ...etc.

I tried to do something with cat | paste - - - ... but It didn't work :(

Any suggests?

Many thanks in advance

EDIT

I forgot to tell you they are big files. :)

Here is my input files

cat file 1
C             18     -2.182951850        -0.000000000        -6.517815410
C             20     -4.127401075         0.000000000        -0.446529291
C             22     -3.314258919        -2.494999886       -15.624910016
C             24     -6.071850300         0.000000000         5.624757806
C             26     -2.023950100         0.000000000         5.624757806
C             28     -4.286402584        -0.000000000       -12.589102506
C             30     -6.230851809        -0.000000000        -6.517815410
C             32     -0.079500634         0.000000000        -0.446529291

cat file 2
O             34     -1.393125174        -0.640765928        -5.738276269
O             36     -3.337574640        -0.640765928         0.333010828
O             38     -2.524270589         1.854234106       -14.845370570
O             40     -5.282024106        -0.640765928         6.404297925
O             42     -2.182951850         1.281531856        -6.517815410
O             44     -4.127401075         1.281531856        -0.446529291
O             46     -3.314258919        -1.213468178       -15.624910016
O             48     -6.071850300         1.281531856         5.624757806
O             50     -2.972778044        -0.640765928        -7.297355528
O             52     -4.917227269        -0.640765928        -1.226068432
O             54     -4.104085113         1.854234106       -16.404449463
O             56     -6.861676614        -0.640765928         4.845217687
O             58     -2.813776294         0.640765779         4.845217687
O             60     -5.076228778         0.640765779       -13.368642136
O             62     -7.020678123         0.640765779        -7.297355528
O             64     -0.869326828         0.640765779        -1.226068432
O             66     -2.023950100        -1.281531708         5.624757806
O             68     -4.286402584        -1.281531708       -12.589102506
O             70     -6.230851809        -1.281531708        -6.517815410
O             72     -0.079500634        -1.281531708        -0.446529291
O             74     -1.234123906         0.640765779         6.404297925
O             76     -3.496576390         0.640765779       -11.809563365
O             78     -5.441025615         0.640765779        -5.738276269
O             80      0.710325077         0.640765779         0.333010828

C18 must be followed by O34, O42 and O50. Then C20 followed by O36, O44 and O52 and so on:

cat file 3
C             18     -2.182951850        -0.000000000        -6.517815410 
O             34     -1.393125174        -0.640765928        -5.738276269
O             42     -2.182951850         1.281531856        -6.517815410
O             50     -2.972778044        -0.640765928        -7.297355528
C             20     -4.127401075         0.000000000        -0.446529291
O             36     -3.337574640        -0.640765928         0.333010828
O             44     -4.127401075         1.281531856        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
..             ..      ............        .............       .........

The output generated by Tom code is this:

Tom output
C             18     -2.182951850        -0.000000000        -6.517815410
O             34     -1.393125174        -0.640765928        -5.738276269
O             42     -2.182951850         1.281531856        -6.517815410
O             50     -2.972778044        -0.640765928        -7.297355528
O             58     -2.813776294         0.640765779         4.845217687
O             66     -2.023950100        -1.281531708         5.624757806
O             74     -1.234123906         0.640765779         6.404297925
C             20     -4.127401075         0.000000000        -0.446529291
O             36     -3.337574640        -0.640765928         0.333010828
O             44     -4.127401075         1.281531856        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
O             60     -5.076228778         0.640765779       -13.368642136
O             68     -4.286402584        -1.281531708       -12.589102506
O             76     -3.496576390         0.640765779       -11.809563365
C             22     -3.314258919        -2.494999886       -15.624910016
O             38     -2.524270589         1.854234106       -14.845370570
O             46     -3.314258919        -1.213468178       -15.624910016
O             54     -4.104085113         1.854234106       -16.404449463
O             62     -7.020678123         0.640765779        -7.297355528
O             70     -6.230851809        -1.281531708        -6.517815410
O             78     -5.441025615         0.640765779        -5.738276269
and     so   on

Any suggest?

Thank you

git
  • 151
  • 9
  • 1
    How big are we talking here? Megabytes? Gigabytes? More than you could fit into memory, for example? – ghoti Oct 02 '15 at 17:27
  • 1
    Also ... C1 matches O1, C3 matches O3... Presumably the line after C3 matches O5, and the line after that matches O7. What happens after that? Or are there only four iterations of this? – ghoti Oct 02 '15 at 17:32
  • @ghoti They are big but no too much as gigabytes. The sequence has been displayed above in the EDIT part of the question... – git Oct 04 '15 at 09:28
  • Your edit doesn't answer the question of what happens after the initial pattern is exhausted. The pattern you've described supports ONLY FOUR iterations, because for the set starting with C26, you've already used up O42. Or do we repeat O42 in order to make it part of the new set? Or do we jump ahead so that C26 is followed by O58? – ghoti Oct 05 '15 at 11:24
  • @ghoti Ok sorry about that, the iteractions repeat up to all "C" rows are coupled with its related "O" rows. I don't know exactly how many rows of "C" it will be because it depends on my initial settings. If you count, the pattern is: first C row goes with first O row followed by O 4th row and followed by 9th O row (3 rows on between); the next 2nd C row goes with 2nd O row, followed by 5th O row, followed by 10th row...and so on. I hope it be clear now. Many thanks – git Oct 05 '15 at 11:32
  • Sorry, don't worry for the "O" rows it will be enough to coupling with O rows without repeat any O row, but I write down a few O rows to minimizate the text lenght. – git Oct 05 '15 at 11:47
  • So after the initial four sets we get C26, O42, O50, O58, C28, O44, O52, O60. etc? – ghoti Oct 05 '15 at 11:49

2 Answers2

2

I would suggest using awk to do this:

# first file
NR == FNR { 
    a[NR] = $0  # save each line into array
    ++len
    next        # skip further blocks
}

{ b[FNR] = $0 } # save each line from 2nd file into array

END {
    # loop through and print
    for (i = 1; i <= len; ++i) {
        print a[i]
        for (j = i; j <= FNR; j += 4) print b[j]
    }
}

The script can be run like awk -f script.awk file1 file2.

Tom Fenech
  • 72,334
  • 12
  • 107
  • 141
  • Ok that's a good solution but I'm already using a script, so I can implement the code there. Many thanks – git Sep 28 '15 at 15:12
  • If you mean a shell script and you want to run the awk script inline, you can wrap this script in quotes and call it like `awk 'script' file1 file2`. – Tom Fenech Sep 28 '15 at 15:21
  • I did `awk -f $SCRIPTDIR/script.awk infile1 infile2 > outfile` in my shell script – git Sep 28 '15 at 15:34
  • Looks good. Some general advice, quote your variables and don't use uppercase variable names (as they may clash with shell internals): `awk -f "$script_dir/concat.awk" ...`. – Tom Fenech Sep 28 '15 at 15:50
1

What you've described (via confirmation in comments) is a pattern which

  • consists of a C line
  • samples a set of nine O lines, starting with one at the same offset as the C line.

To handle this, I'd use awk with a 9-line "sliding window" as a buffer.

And rather than use Tom's solution of pointing awk at the two files sequentially and reading one into an array, I'd suggest reading from both files simultaneously so that you don't eat so much memory to hold the array.

Here's what I mean, as a one-liner:

awk '{a[NR]=$0;delete a[NR-10];} NR>9{getline Cline < "fileC";print Cline;print a[NR-9]; print a[NR-5]; print a[NR-1];}' fileO

Broken out for easier reading (and comments), this looks like:

awk '
  {
    a[NR]=$0;        # Store our current "O" line in an array
    delete a[NR-10]; # Clean the array as we step through the file
  }

  NR>9 {
    getline Cline < "fileC";  # Get the next "C" line...
    print Cline;              # ... and print it
    print a[NR-9];            # \ 
    print a[NR-5];            #  > Print the three "O" lines for this 
    print a[NR-1];            # /
  }
' fileO

Mind that you have the correct number of "O" lines, because if the last set of "O" lines is incomplete, it won't get printed.

My output from your sample data looks like this:

C             18     -2.182951850        -0.000000000        -6.517815410
O             34     -1.393125174        -0.640765928        -5.738276269
O             42     -2.182951850         1.281531856        -6.517815410
O             50     -2.972778044        -0.640765928        -7.297355528
C             20     -4.127401075         0.000000000        -0.446529291
O             36     -3.337574640        -0.640765928         0.333010828
O             44     -4.127401075         1.281531856        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
C             22     -3.314258919        -2.494999886       -15.624910016
O             38     -2.524270589         1.854234106       -14.845370570
O             46     -3.314258919        -1.213468178       -15.624910016
O             54     -4.104085113         1.854234106       -16.404449463
C             24     -6.071850300         0.000000000         5.624757806
O             40     -5.282024106        -0.640765928         6.404297925
O             48     -6.071850300         1.281531856         5.624757806
O             56     -6.861676614        -0.640765928         4.845217687
C             26     -2.023950100         0.000000000         5.624757806
O             42     -2.182951850         1.281531856        -6.517815410
O             50     -2.972778044        -0.640765928        -7.297355528
O             58     -2.813776294         0.640765779         4.845217687
C             28     -4.286402584        -0.000000000       -12.589102506
O             44     -4.127401075         1.281531856        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
O             60     -5.076228778         0.640765779       -13.368642136
C             30     -6.230851809        -0.000000000        -6.517815410
O             46     -3.314258919        -1.213468178       -15.624910016
O             54     -4.104085113         1.854234106       -16.404449463
O             62     -7.020678123         0.640765779        -7.297355528
C             32     -0.079500634         0.000000000        -0.446529291
O             48     -6.071850300         1.281531856         5.624757806
O             56     -6.861676614        -0.640765928         4.845217687
O             64     -0.869326828         0.640765779        -1.226068432
C             32     -0.079500634         0.000000000        -0.446529291
O             50     -2.972778044        -0.640765928        -7.297355528
O             58     -2.813776294         0.640765779         4.845217687
O             66     -2.023950100        -1.281531708         5.624757806
C             32     -0.079500634         0.000000000        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
O             60     -5.076228778         0.640765779       -13.368642136
O             68     -4.286402584        -1.281531708       -12.589102506
C             32     -0.079500634         0.000000000        -0.446529291
O             54     -4.104085113         1.854234106       -16.404449463
O             62     -7.020678123         0.640765779        -7.297355528
O             70     -6.230851809        -1.281531708        -6.517815410
C             32     -0.079500634         0.000000000        -0.446529291
O             56     -6.861676614        -0.640765928         4.845217687
O             64     -0.869326828         0.640765779        -1.226068432
O             72     -0.079500634        -1.281531708        -0.446529291
C             32     -0.079500634         0.000000000        -0.446529291
O             58     -2.813776294         0.640765779         4.845217687
O             66     -2.023950100        -1.281531708         5.624757806
O             74     -1.234123906         0.640765779         6.404297925
C             32     -0.079500634         0.000000000        -0.446529291
O             60     -5.076228778         0.640765779       -13.368642136
O             68     -4.286402584        -1.281531708       -12.589102506
O             76     -3.496576390         0.640765779       -11.809563365
C             32     -0.079500634         0.000000000        -0.446529291
O             62     -7.020678123         0.640765779        -7.297355528
O             70     -6.230851809        -1.281531708        -6.517815410
O             78     -5.441025615         0.640765779        -5.738276269

Is that what you meant?

ghoti
  • 45,319
  • 8
  • 65
  • 104