1

I am trying to concatenate a few thousand files that are in different subfolders into a single file and also have the name of each concatenated file inserted as the first column so that I know which file each data row came from. Essentially starting with something like this:

EDIT: I neglected to mention that each file has the same header so I updated the request accordingly.

Folder1
file1.txt
A   B   C
123 010 ...
456 020 ...
789 030 ...

Folder2
file2.txt 
A   B   C
abc 100 ...
efg 200 ...
hij 300 ...

and outputting this:

CombinedFile.txt
A      B    C
file1  123  010 ...
file1  456  020 ...
file1  789  030 ...
file2  abc  100 ...
file2  efg  200 ...
file2  hij  300 ...

After reading this post, I have tried the following code, but end up with a syntax error (apologies, I'm super new to awk!)

shopt -s globstar
for filename in path/**/*.txt; do
    awk '{print FILENAME "\t" $0}' *.txt > CombinedFile.txt
done

Thanks for your help!

vanish007
  • 323
  • 1
  • 10
  • 3
    I don't know about syntax errors, but you don't need two globs. Either `awk '...' path/**/*.txt > CombineFile.txt` or `for filename in path/**/*.txt; do awk '...' "$filename"; done > CombinedFile.txt` would suffice. – chepner Dec 12 '22 at 20:05
  • 1
    agree with above comment. Note that you're never using the `$filename` variable from your `for` loop. Because you're going thru various subdirs, maybe you can write an `find` expression that will get just the files you want? `find /path/dir1 /path/dir2 .... -name 'file*.txt' | xargs awk ... >> finalProduct.txt` may be an idea. Good luck. – shellter Dec 12 '22 at 20:11
  • is it possible for two files (in different folders) to have the same name (eg, `Folder1/fileXX.txt` and `Folder2/fileXX.txt`)? and if the answer is 'yes', is there a requirement to distinguish the entries in the final output (eg, add the folder name at the beginning of each line of data)? – markp-fuso Dec 12 '22 at 20:13
  • @markp-fuso, no each file has a different file name. – vanish007 Dec 12 '22 at 20:19
  • 1
    _"**a few thousand files**"_ Do you get an `Argument list too long` error with `ls path/**/*.txt`? – Fravadona Dec 12 '22 at 20:20

1 Answers1

5

This single awk should be able to do it without any looping:

shopt -s globstar
awk 'FNR == 1 {
   f = FILENAME
   gsub(/^.*\/|\.[^.]+$/, "", f)
   if (NR > 1) # show header for first file only
      next
}
{
   print f, $0
}' path/**/*.txt > CombinedFile.txt

cat CombinedFile.txt
file1 123 010
file1 456 020
file1 789 030
file2 abc 100
file2 efg 200
file2 hij 300
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Hi anubhava, I noticed something strange occurring after concatenation - since each file has the same header (something I neglected to mention in my original comment), the header seems to repeat in some of the empty columns. So going with the above example, if the headers above "file1, 123, and 010" were "A, B, C" then I would sometimes get a repeated A or B or C within columns with any missing values. Is there any way to fix it so that it stays empty or doesn't overwrite a header? Thank you! – vanish007 Jan 06 '23 at 13:54
  • I got a syntax error: `syntax error near unexpected token `/^.*\/' ` I'm running the following: `awk 'FNR == 1 {f = FILENAME; gsub(/^.*\/|\.[^.]+$/, "", f); next} shopt -s globstar awk 'FNR == 1 {f = FILENAME; gsub(/^.*\/|\.[^.]+$/, "", f)} {print f, $0}' path/**/*.txt > CombinedFile.txt` The same syntax error occurs whether I run it before or after `shopt -s globstar` – vanish007 Jan 06 '23 at 14:55
  • 1
    I have updated answer to make it clear. Can you try it now? – anubhava Jan 06 '23 at 15:02
  • Thank you, that seemed to work correctly, but it did get rid of the header completely so the first sample line ie `file1 123 010` is used as the header instead of "A, B, C", but at least I can reintroduce the header through R. I updated my original question to add the pertinent information I previously neglected to mention. – vanish007 Jan 06 '23 at 16:11
  • 1
    I have updated my answer again, please try it out. – anubhava Jan 06 '23 at 16:28
  • 1
    Thanks Anubhava, that worked beautifully! – vanish007 Jan 09 '23 at 19:21