0

I have bash script that would merge a huge list of text files and filter it. However I'll encounter 'argument line too long' error due to the huge list.

echo -e "`cat $dir/*.txt`" | sed '/^$/d' | grep -v "\-\-\-" | sed '/</d' | tr -d \' | tr -d '\\\/<>(){}!?~;.:+`*-_ͱ' | tr -s ' ' | sed 's/^[ \t]*//' | sort -us -o $output

I have seen some similar answers here and i know i could rectify it using find and cat the files 1st. However, i would i like to know what is the best way to run a one liner code using echo -e and cat without breaking the code and to avoid the argument line too long error. Thanks.

Potential Coder
  • 91
  • 3
  • 11
  • 1
    Is there a reason you're using `echo -e`? Do you _really_ want to change escape sequences inside of your text files into the characters they represent? – Charles Duffy Oct 08 '14 at 03:35
  • 1
    In general, the Right Way to avoid an argument-list-too-long error is to use a command which runs multiple commands when the arguments won't all fit on one. `find "$dir" -name '*.txt' -exec cat '{}' +` is a typical example. – Charles Duffy Oct 08 '14 at 03:38
  • yup i need to change the escape sequences. ;( – Potential Coder Oct 08 '14 at 03:57

2 Answers2

3

First, with respect to the most immediate problem: Using find ... -exec cat -- {} + or find ... -print0 | xargs -0 cat -- will prevent more arguments from being put on the command line to cat than it can handle.


The more portable (POSIX-specified) alternative to echo -e is printf '%b\n'; this is available even in configurations of bash where echo -e prints -e on output (as when the xpg_echo and posix flags are set).

However, if you use read without the -r argument, the backslashes in your input string are removed, so neither echo -e nor printf %b will be able to process them later.

Fixing this can look like:

while IFS= read -r line; do
  printf '%b\n' "$line"
done \
  < <(find "$dir" -name '*.txt' -exec cat -- '{}' +) \
  | sed [...]
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
1
grep -v '^$' $dir/*.txt | grep -v "\-\-\-" | sed '/</d' | tr -d \' \
  | tr -d '\\\/<>(){}!?~;.:+`*-_ͱ' | tr -s ' ' | sed 's/^[ \t]*//' \
  | sort -us -o $output

If you think about it some more you can probably get rid of a lot more stuff and turn it into a single sed and sort, roughly:

sed -e '/^$/d' -e '/\-\-\-/d' -e '/</d' -e 's/\'\\\/<>(){}!?~;.:+`*-_ͱ//g' \
  -e 's/  / /g' -e 's/^[ \t]*//' $dir/*.txt | sort -us -o $output
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • thanks. that helps but how would i include ehco -e for interpretation of backslash escapes or is there a better solution? sorry i'm still learning. – Potential Coder Oct 08 '14 at 03:40