0

I have the following situation. A directory with really a lot of subdirectories, and each of those subdirectories contains a file of interest that I want to concatenate. e.g.,

my_dir/
    subdir1/
            subsubdir/
                file_of_interest1.txt
                ...
    subdir2/
            subsubdir/
                file_of_interest1.txt
                ...
    ...

Now, I tried using cat my_dir/*/*/*.txt > all.txt
But unfortunately, the subdirectory tree is so large that I get the following error:

bash: /bin/cat: Argument list too long

Is there a clever way to circumvent the problem, e.g., by concatenating the files in smaller chunks? E.g., concatenating 1/3 of the subdirs, then another 1/3 and 1/3, and then joining them alltogether?

1 Answers1

3

Let find go through the files and add as many as possible to each cat invocation's command line:

find . -type f -name '*.txt' -exec cat '{}' + >all.txt

If your find doesn't support -exec ... {} + (which it should if compliant with current versions of the POSIX spec), there's also an approach using GNU extensions to make xargs safe:

find . -type f -name '*.txt' -print0 | xargs -0 cat >all.txt

Using xargs without -0 is unsafe -- it doesn't correctly handle filenames with newlines in that case, among other issues (some but not all of which can be avoided with other options). Think about a malicious user creating a file $'foo \n/etc/passwd' -- you don't want to run the risk of injecting /etc/passwd into your output.

Finally, there's the less-efficient, older way to use find -exec (which invokes a separate copy of cat for each file found):

find . -type f -name '*.txt' -exec cat '{}' ';' >all.txt

...or, at a similar penalty (of invoking cat multiple times), you can simply use a loop in your shell script:

for f in my_dir/*/*/*.txt; do
  cat "$f"
done >all.txt

Note that this does the redirection on the entire loop, rather than (less efficiently) on a per-file basis.


Aside: If using POSIX sh or bash, quoting {} isn't necessary. However, you do need to quote {} if attempting to support zsh, and so I do so here.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Since GNU find already supports `-exec .. +`, using a GNU extension to work around the lack of it it will rarely be necessary. I'd suggest `-exec .. \;` as the alternative instead. – that other guy Feb 17 '14 at 18:08
  • `-exec {} +` isn't a GNU extension, though, it's recent POSIX, so there are certainly non-GNU systems (such as modern BSD) which support it. `-exec {} ';'` is a fallback, to be sure, but it's one to be used only in cases of last resort. – Charles Duffy Feb 17 '14 at 18:10
  • I'm saying that `-print0` is a non-standard GNU extension, which primarily works on GNU find, which already supports `-exec .. +` in the first place. – that other guy Feb 17 '14 at 18:12
  • @thatotherguy True. To be honest, I'm documenting the `xargs -0` approach to cut off other potential answers providing unsafe uses of xargs, not because it's likely to be actually useful. – Charles Duffy Feb 17 '14 at 18:16
  • Great, thanks, I went with the less efficient last option, all went fine! Now I have a plethora of alternatives to add to my reference snippets :) –  Feb 17 '14 at 19:30