4

I use the command wc -l count number of lines in my text files (also i want to sort everything through a pipe), like this:

wc -l $directory-path/*.txt | sort -rn

The output includes "total" line, which is the sum of lines of all files:

10 total
5 ./directory/1.txt
3 ./directory/2.txt
2 ./directory/3.txt

Is there any way to suppress this summary line? Or even better, to change the way the summary line is worded? For example, instead of "10", the word "lines" and instead of "total" the word "file".

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • The `man` page for `wc` doesn't mention any such functionality. You can whip up a script (or probably use pipes and `awk`) to change the appearance of the output. –  Dec 29 '16 at 18:36
  • 1
    Pipe it to `tail +2` to skip the first line. – Barmar Dec 29 '16 at 18:43
  • @Barmar: That's unreliable. It only prints the `total` line if there's more than one file. And at least on my system, the `total` line is printed last -- as POSIX specifically requires. ipo: Do you really get the output you show, with the `10 total` line at the top? – Keith Thompson Dec 29 '16 at 20:16
  • 2
    Based on your comments, I think you're seeing `10 total` at the top because you're sorting the output. You need to mention that in the question. Show us the exact command you're running, and its exact output. And `$directory-path` is not a valid variable name. – Keith Thompson Dec 29 '16 at 22:18

12 Answers12

6

Yet a sed solution!

1. short and quick

As total are comming on last line, $d is the command for deleting last line.

wc -l $directory-path/*.txt | sed '$d'

2. with header line addition:

wc -l $directory-path/*.txt | sed '$d;1ilines total'

Unfortunely, there is no alignment.

3. With alignment: formatting left column at 11 char width.

wc -l $directory-path/*.txt |
    sed -e '
        s/^ *\([0-9]\+\)/          \1/;
        s/^ *\([0-9 ]\{11\}\) /\1 /;
        /^ *[0-9]\+ total$/d;
        1i\      lines filename'

Will do the job

      lines file
          5 ./directory/1.txt
          3 ./directory/2.txt
          2 ./directory/3.txt

4. But if really your wc version could put total on 1st line:

This one is for fun, because I don't belive there is a wc version that put total on 1st line, but...

This version drop total line everywhere and add header line at top of output.

wc -l $directory-path/*.txt |
    sed -e '
        s/^ *\([0-9]\+\)/          \1/;
        s/^ *\([0-9 ]\{11\}\) /\1 /;
        1{
            /^ *[0-9]\+ total$/ba;
            bb;
           :a;
            s/^.*$/      lines file/
        };
        bc;
       :b;
        1i\      lines file' -e '
       :c;
        /^ *[0-9]\+ total$/d
    '

This is more complicated because we won't drop 1st line, even if it's total line.

F. Hauri - Give Up GitHub
  • 64,122
  • 17
  • 116
  • 137
1

This is actually fairly tricky.

I'm basing this on the GNU coreutils version of the wc command. Note that the total line is normally printed last, not first (see my comment on the question).

wc -l prints one line for each input file, consisting of the number of lines in the file followed by the name of the file. (The file name is omitted if there are no file name arguments; in that case it counts lines in stdin.)

If and only if there's more than one file name argument, it prints a final line containing the total number of lines and the word total. The documentation indicates no way to inhibit that summary line.

Other than the fact that it's preceded by other output, that line is indistinguishable from output for a file whose name happens to be total.

So to reliably filter out the total line, you'd have to read all the output of wc -l, and remove the final line only if the total length of the output is greater than 1. (Even that can fail if you have files with newlines in their names, but you can probably ignore that possibility.)

A more reliable method is to invoke wc -l on each file individually, avoiding the total line:

for file in $directory-path/*.txt ; do wc -l "$file" ; done

And if you want to sort the output (something you mentioned in a comment but not in your question):

for file in $directory-path/*.txt ; do wc -l "$file" ; done | sort -rn

If you happen to know that there are no files named total, a quick-and-dirty method is:

wc -l $directory-path/*.txt | grep -v ' total$'

If you want to run wc -l on all the files and then filter out the total line, here's a bash script that should do the job. Adjust the *.txt as needed.

#!/bin/bash

wc -l *.txt > .wc.out
lines=$(wc -l < .wc.out)
if [[ lines -eq 1 ]] ; then
    cat .wc.out
else
    (( lines-- ))
    head -n $lines .wc.out
fi
rm .wc.out

Another option is this Perl one-liner:

wc -l *.txt | perl -e '@lines = <>; pop @lines if scalar @lines > 1; print @lines'

@lines = <> slurps all the input into an array of strings. pop @lines discards the last line if there are more than one, i.e., if the last line is the total line.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
1

The program wc, always displays the total when they are two or more than two files ( fragment of wc.c):

if (argc > 2)
     report ("total", total_ccount, total_wcount, total_lcount);
   return 0;

also the easiest is to use wc with only one file and find present - one after the other - the file to wc:

find $dir -name '*.txt' -exec wc -l {} \;

Or as specified by liborm.

dir="."
find $dir -name '*.txt' -exec wc -l {} \; | sort -rn | sed 's/\.txt$//'
V. Michel
  • 1,599
  • 12
  • 14
  • Thats nearly the solution! But i need to pipe this one as well to `| sort -rn | sed 's/\.txt$//' ` Where should i place this pipe? I tried `find $dicitonary-path/*.txt-exec wc -l {} \ | sort -rn | sed 's/\.txt$//';` ...but this is wrong. – idontknowwhoiamgodhelpme Dec 29 '16 at 22:01
  • I think you're missing a `-name` argument in your `find` command. – Keith Thompson Dec 29 '16 at 22:15
  • @ipo like that, but without the typos.. `find $PATH -name '*.txt' -exec wc -l {} \; | sort -rn | sed 's/\.txt$//'` – liborm Dec 29 '16 at 22:32
  • @liborm :Thank's, i have put your cmd inside my response. If it's a problem, i can remove it. – V. Michel Dec 29 '16 at 23:43
  • @Keith Thompson : You're right, thank's for your help. – V. Michel Dec 29 '16 at 23:50
  • @ipo: Why would you want to strip the `.txt` portion of the file names (`sed 's/\.txt$//'`)? You really need to update your question and state the problem more precisely. Read this: [mcve] – Keith Thompson Dec 29 '16 at 23:53
  • It's 2 or more files, not more than 2 files. `argc` is the number of arguments including `argv[0]`, which is the program name (`"wc"`). – Keith Thompson Dec 30 '16 at 00:23
1

This is a job tailor-made for head:

wc -l | head --lines=-1

This way, you can still run in one process.

b_squared
  • 71
  • 3
  • There are a lot of complicated solutions from people having fun with the problem, but `head -n -1` *before* sorting seems best. Surprising that `wc` does not have a quiet or script use mode. – Kevin Jul 15 '22 at 14:28
0

Can you use another wc ?

The POSIX wc(man -s1p wc) shows
If more than one input file operand is specified, an additional line shall be written, of the same format as the other lines, except that the word total (in the POSIX locale) shall be written instead of a pathname and the total of each column shall be written as appropriate. Such an additional line, if any, is written at the end of the output.

You said the Total line was the first line, the manual states its the last and other wc's don't show it at all. Removing the first or last line is dangerous, so I would grep -v the line with the total (in the POSIX locale...), or just grep the slash that's part of all other lines:

wc -l $directory-path/*.txt | grep "/"
Walter A
  • 19,067
  • 2
  • 23
  • 43
0

Not the most optimized way since you can use combinations of cat, echo, coreutils, awk, sed, tac, etc., but this will get you want you want:

wc -l ./*.txt | awk 'BEGIN{print "Line\tFile"}1' | sed '$d'

wc -l ./*.txt will extract the line count. awk 'BEGIN{print "Line\tFile"}1' will add the header titles. The 1 corresponds to the first line of the stdin. sed '$d' will print all lines except the last one.

Example Result

Line    File
      6 ./test1.txt
      1 ./test2.txt
jojo
  • 1,135
  • 1
  • 15
  • 35
  • All i get is something like this 'Line File' above '10 total'. So like your example, but with the total-information again. – idontknowwhoiamgodhelpme Dec 29 '16 at 21:52
  • @ipo: what kind of system are you running? I'm using zsh on a OSX system. My total line count appears at the end. Try using this: `wc -l ./*.txt | awk 'BEGIN{print "Line\tFile"}1' | sed '2d'`. The only difference is that the `sed` should delete the 2nd line, not the last line now. – jojo Dec 29 '16 at 21:56
0

The simplicity of using just grep -c

I rarely use wc -l in my scripts because of these issues. I use grep -c instead. Though it is not as efficient as wc -l, we don't need to worry about other issues like the summary line, white space, or forking extra processes.

For example:

/var/log# grep -c '^' *
alternatives.log:0
alternatives.log.1:3
apache2:0
apport.log:160
apport.log.1:196
apt:0
auth.log:8741
auth.log.1:21534
boot.log:94
btmp:0
btmp.1:0
<snip>

Very straight forward for a single file:

line_count=$(grep -c '^' my_file.txt)

Performance comparison: grep -c vs wc -l

/tmp# ls -l *txt
-rw-r--r-- 1 root root 721009809 Dec 29 22:09 x.txt
-rw-r----- 1 root root 809338646 Dec 29 22:10 xyz.txt

/tmp# time grep -c '^' *txt

x.txt:7558434
xyz.txt:8484396

real    0m12.742s
user    0m1.960s
sys 0m3.480s

/tmp/# time wc -l *txt
   7558434 x.txt
   8484396 xyz.txt
  16042830 total

real    0m9.790s
user    0m0.776s
sys 0m2.576s
codeforester
  • 39,467
  • 16
  • 112
  • 140
  • 2
    But `grep -c .` counts non-empty lines. You'll probably want `grep -c ''` as an approximation of `wc -l` (the two differ by one if the last “line” doesn't end with a newline). – gniourf_gniourf Dec 29 '16 at 22:16
  • 2
    Wonderful observation, @gniourf_gniourf. I changed the command to `grep -c '^'`. – codeforester Dec 29 '16 at 22:18
  • 1
    `grep -c '^'` also differs by one from `wc -l` if the last line doesn't end with a newline. In fact `grep` (at least the GNU version) always silently appends a newline if the last line doesn't have one. – Keith Thompson Dec 30 '16 at 00:28
0

You can solve it (and many other problems that appear to need a for loop) quite succinctly using GNU Parallel like this:

parallel wc -l ::: tmp/*txt

Sample Output

   3 tmp/lines.txt
   5 tmp/unfiltered.txt
  42 tmp/file.txt
   6 tmp/used.txt
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • `parallel -j1` if your files are really big, otherwise you'll clog your disk with parallel requests for data.. – liborm Dec 29 '16 at 22:34
  • Possibly, though many folk run very fast SSDs nowadays and there was no indication that OP is using excessively large files and it could actually be an advantage to use **GNU Parallel** there anyway. – Mark Setchell Dec 29 '16 at 22:39
0

Similar to Mark Setchell's answer you can also use xargs with an explicit separator:

ls | xargs -I% wc -l %

Then xargs explicitly doesn't send all the inputs to wc, but one operand line at a time.

Ivan Zarea
  • 2,174
  • 15
  • 13
0

Shortest answer:

ls | xargs -l wc
Allen Supynuk
  • 144
  • 1
  • 4
0

What about using sed with the pattern removal option as below which would only remove the total line if it is present (but also any files with total in them).

wc -l $directory-path/*.txt | sort -rn | sed '/total/d'

jimjam100
  • 51
  • 1
0

While most of the answers center around removing the unneeded line, or using a version of wc that allows suppressing it, there's something to be said in favor of never producing it in the first place.

So you want to count lines in $directory-path/*.txt files, however feeding several files to wc will produce the total — which you don't want.

I would change your pipeline to find the files and feeding them to wc one by one, in this manner:

find $directory-path -name "*.txt" | xargs -L 1 wc -l | sort -rn

In this case, find is tasked with locating files, while xargs -L 1 is tasked with feeding them to wc one by one.

MaratC
  • 6,418
  • 2
  • 20
  • 27