-1

I have over 100 files with at least 5-8 columns (tab-separated) in each file. I need to extract first three columns from each file and add fourth column with some predefined text and append them.

Let's say I have 3 files: file001.txt, file002.txt, file003.txt.

file001.txt:

chr1 1 2 15
chr2 3 4 17

file002.txt:

chr1 1 2 15
chr2 3 4 17

file003.txt:

chr1 1 2 15
chr2 3 4 17

combined_file.txt:

chr1 1 2 f1
chr2 3 4 f1
chr1 1 2 f2
chr2 3 4 f2
chr1 1 2 f3
chr2 3 4 f3

For simplicity I kept file contents same. My script is as follows:

#!/bin/bash
for i in {1..3}; do
j=$(printf '%03d' $i)
awk 'BEGIN { OFS="\t"}; {print $1,$2,$3}' file${j}.txt | awk -v k="$j" 'BEGIN {print $0"\t$k”}' | cat >> combined_file.txt
done

But the script is giving the following errors:

awk: non-terminated string $k”}... at source line 1 context is

<<< awk: giving up source line number 2 awk: non-terminated string $k”}... at source line 1 context is <<< awk: giving up source line number 2

Can some one help me to figure it out?

Naresh DJ
  • 91
  • 1
  • 9
  • 2
    You have a problem statement, and coupled with that you have a bash script solving (what I assume) are parts of your problem. Where are you stuck? I'm missing a question. It is also confusing with your file00.txt, where all of them have the exact same content. – mattias Jul 08 '16 at 20:58
  • @mattias, post is edited. – Naresh DJ Jul 08 '16 at 21:04
  • 1
    You have mixed some special characters here. Notice the difference between " and ” in your BEGIN statement `'BEGIN {print $0"\t$k”}'`. That should get you out of that error you're getting. But then you probably have other issues with the awk command. – mattias Jul 08 '16 at 21:11
  • @mattias, Thanks. It is working now but printing the fourth column as $k instead of its value. – Naresh DJ Jul 08 '16 at 21:15
  • use echo $k instead of BEGIN {print $0"\t$k”} – aguertin Jul 08 '16 at 21:34
  • @aguertin, can you be specific, you want to replace BEGIN {print $0"\t$k”} with echo $k inside awk? – Naresh DJ Jul 08 '16 at 21:47
  • @ aguertin, Nope. Its not working. – Naresh DJ Jul 08 '16 at 21:53

2 Answers2

3

You don't need two different awk scripts. And you don't use $ to refer to variables in awk, that's used to refer to input fields (i.e. $k means access the field whose number is in the variable k).

for i in {1..3}; do
    j=$(printf '%03d' $i)
    awk -v k="$j" -v OFS='\t' '{print $1, $2, $3, k}' file$j.txt
done > combined_file.txt
Barmar
  • 741,623
  • 53
  • 500
  • 612
1

As pointed out in the comments your problem is youre trying to use odd characters as if they were double quotes. Once you fix that though, you don't need a loop or any of that other complexity all you need is:

$ awk 'BEGIN{FS=OFS="\t"} {$NF="f"ARGIND} 1' file*
chr1    1       2       f1
chr2    3       4       f1
chr1    1       2       f2
chr2    3       4       f2
chr1    1       2       f3
chr2    3       4       f3

The above used GNU awk for ARGIND.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Thanks. Can you please explain the details. – Naresh DJ Jul 09 '16 at 18:31
  • Sure - what part do you not understand? If you just want to start learning awk, I highly recommend the book Effective Awk Programming, 4th Edition, by Arnold Robbins. – Ed Morton Jul 09 '16 at 20:43