0

i have many text files in a directory like 1.txt 2.txt 3.txt 4.txt .......2000.txt and i want to paste them to make a large file.

In this regard i did something like

paste *.txt > largefile.txt

but the above command reads the .txt file randomly, so i need to read the files sequentially and paste as 1.txt 2.txt 3.txt....2000.txt please suggest a better solution for pasting many files. Thanks and looking forward to hearing from you.

manas
  • 479
  • 2
  • 14
  • `but the above command reads the .txt file randomly,` are you sure? Files should be sorted, which means `1.txt 10.txt 11.txt .... 100.txt 101.txt ... 1000.txt 1001.txt ... 2.txt ... `etc. But it should no be rnadom. – KamilCuk Jul 01 '21 at 11:21
  • yes sir, i am sure –  Jul 01 '21 at 11:27
  • then can you post the output of `ech paste *.txt`? Are you sure you are using bash? What is the output of `declare -p BASH_VERSION`? – KamilCuk Jul 01 '21 at 11:29

3 Answers3

1

Sort the file names numerically yourself then.

printf "%s\n" *.txt | sort -n | xargs -d '\n' paste

When dealing with many files, you may hit ulimit -n. On my system ulimit -n is 1024, but this is a soft limit and can be raised with just like ulimit -n 99999.

Without raising the soft limit, go with a temporary file that would accumulate results each "round" of ulimit -n count of files, like:

touch accumulator.txt
... | xargs -d '\n' -n $(($(ulimit -n) - 1)) sh -c '
       paste accumulator.txt "$@" > accumulator.txt.sav;
       mv accumulator.txt.sav accumulator.txt
' _
cat accumulator.txt
KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • it gets failure when more than 1000 files present. it shows error like paste: 1022.txt: Too many open files –  Jul 02 '21 at 13:22
  • please suggest a better solution –  Jul 02 '21 at 13:28
  • `paste: 1022.txt: Too many open files ` Ugh, the `ulimit -n` must be low. Well, the only way it to merge 1024 files at a time and then marge 1024 files together... :/ But most probably you can raise `ulimit -n` yourself, it's a soft limit. – KamilCuk Jul 02 '21 at 17:41
0

Instead use the wildcard * to enumerate all your files in a directory, if your file names pattern are sequentially ordered, you can manually list all files in order and concatenate to a large file. The output order of * enumeration might look different in different environment, as it not works as you expect.

Below is a simple example

$ for i in `seq 20`;do echo $i > $i.txt;done
# create 20 test files, 1.txt, 2.txt, ..., 20.txt with number 1 to 20 in each file respectively
$ cat {1..20}.txt
# show content of all file in order 1.txt, 2.txt, ..., 20.txt
$ cat {1..20}.txt > 1_20.txt
# concatenate them to a large file named 1_20.txt
alijandro
  • 11,627
  • 2
  • 58
  • 74
0

In bash or any other shell, glob expansions are done in lexicographical order. When having files numberd, this sadly means that 11.txt < 1.txt < 2.txt. This weird ordering comes from the fact that, lexicographically, 1 < . (<dot>-character (".")).

So here are a couple of ways to operate on your files in order:

rename all your files:

for i in *.txt; do mv "$i" "$(sprintf "%0.5d.txt" ${i%.*}"); done
paste *.txt

use brace-expansion:

Brace expansion is a mechanism that allows for the generation of arbitrary strings. For integers you can use {n..m} to generate all numbers from n to m or {n..m..s} to generate all numbers from n to m in steps of s:

paste {1..2000}.txt

The downside here is that it is possible that a file is missing (eg. 1234.txt). So you can do

shopt -s extglob; paste ?({1..2000}.txt)

The pattern ?(pattern) matches zero or one glob-matches. So this will exclude the missing files but keeps the order.

kvantour
  • 25,269
  • 4
  • 47
  • 72