0

I have about 100 text files with two columns that I would like to merge into a single file in a c shell script by using factor "A".

For example, I have file A that looks like this
A B1
1 100
2 200
3 300
4 400

and File B looks like this
A B2
1 100
2 200
3 300
4 400
5 300
6 400

I want the final file C to look like this:
A B1 B2
1 100 100
2 200 200
3 300 300
4 400 400
5 300
6 400


The cat function only puts the files on top of one another and pastes them into file C. I would like to put the data next to each other. Is this possible?

RHatesMe
  • 143
  • 1
  • 2
  • 8

1 Answers1

0

to meet your exact spec, this will work. If the spec changes, you'll need to play with this some,

paste -d' ' factorA factorB \
| awk 'NF==4||NF==3{print $1, $2, $3} NF==2{print$1, $2}' \
> factorC

# note, no spaces or tabs after each of the contintuation chars `\` at end of lines!

output

$ cat factorC
A B1 B2 
1 100 100 
2 200 200 
3 300 300 
4 400 400 
5 300 
6 400

Not sure how you get bold headers to "trasmit" thru unix pipes. ;->

Recall that awk programs all have a basic underlying structure, i.e.

awk 'pattern{action}' file

So pattern can be a range of lines, a reg-exp, an expression (NF==4), missing, or a few other things.

The action is what happens when the pattern is matched. This is more traditional looking code.

If no pattern specified, then action applies to all lines read. If no action is specfied, but the pattern matches, then the line is printed (without further ado).

NF means NumberOfFields in the current line, so NF==2 will only process line with 2 fields (the trailing records in factorB).

The || is a logical OR operator, so that block will only process records, where the number of fields is 3 OR 4. Hopefully, the print statements are self-explanatory.

The , separating $1,$2,$3 (for example) is the syntax that converts to awk's internal variable OFS, which is OutputFieldSeparator, which can be assigned like OFS="\t" (to give an OFS of tab char), or as in this case, we are not specifying a value, so we're getting the default value for OFS, which is the space char (" ") (no quotes!)

IHTH

shellter
  • 36,525
  • 7
  • 83
  • 90