How to count the number of numbers/letters in file?

Question

I try to count the number of numbers and letters in my file in Bash. I know that I can use wc -c file to count the number of characters but how can I fix it to only letters and secondly numbers?

Jens · Answer 1 · 2016-05-16T19:53:42.320

2

Here's a way completely avoiding pipes, just using tr and the shell's way to give the length of a variable with ${#variable}:

$ cat file
123 sdf
231 (3)
huh? 564
242 wr =!
$ NUMBERS=$(tr -dc '[:digit:]' < file)
$ LETTERS=$(tr -dc '[:alpha:]' < file)
$ ALNUM=$(tr -dc '[:alnum:]' < file)
$ echo ${#NUMBERS} ${#LETTERS} ${#ALNUM}
13 8 21

edited May 16 '16 at 19:53

answered May 16 '16 at 19:25

Jens

69,818
15
125
179

dnit13 · Answer 2 · 2023-08-21T15:52:33.963

1

To count the number of letters and numbers you can combine grep with wc:

 grep -Eo '[a-z]' myfile | wc -w
 grep -Eo '[0-9]' myfile | wc -w

With little bit of tweaking you can modify it to count numbers or alphabetic words or alphanumeric words like this,

grep -Eo '[a-z]+' myfile | wc -w
grep -Eo '[0-9]+' myfile | wc -w
grep -Eo '[[:alnum:]]+' myfile | wc -w

edited Aug 21 '23 at 15:52

answered May 15 '16 at 21:33

dnit13

2,478
18
35

Terminal is showing incorrect output from first and second example, hmm? – K.Dote May 15 '16 at 21:46
This counts all characters for *any line with at least* one alphabetic or numeric character. – Jens May 15 '16 at 21:50
Make it `grep -o` for couting `mixed line 111`. – Walter A May 15 '16 at 21:53
The greps mentioned first fail when you have a file named `a` or `7` in the current directory. Always quote shell meta-characters! – Jens Feb 03 '21 at 20:08
@Jens, I don't get you. I've created file `a` in a folder still both lines seem to work (I'm quoting file name). – Alex Martian Jul 25 '23 at 02:13
1

@AlexMartian The answer was edited and a plus was appended to the grep regex. Now if there is a file named `a+` an unquoted `[a-z]+` would expand to `a+`. You can test this with `touch a+; echo [a-z]+; echo '[a-z]+'`. Do you understand now why you must quote regexen? – Jens Jul 25 '23 at 18:07
@Jens, now I see, thank you. Interestingly `echo 'aaa abcd 123' | grep -o '[a-z]+'` produces empty with my copy of "grep (GNU grep) 3.7" though man page has info about `+` (w/out `+` works). – Alex Martian Jul 26 '23 at 04:42
@AlexMartian That's because + is not part of "Basic Regular Expressions (BRE)", only part of "Extended Regular Expressions (ERE)". To enable ERE, use the -E option: `echo 'aaa abcd 123' | grep -Eo '[a-z]+'` and it will output two lines with aaa and abcd. – Jens Jul 26 '23 at 13:31
@Jens, worth editing the answer or my grep is rather peculiar? – Alex Martian Jul 26 '23 at 13:39
1

@AlexMartian Hm. It looks like the dnit13's answer has deteriorated quite a bit when the + was added, because `wc -c` counts characters, not words as the text suggests, when it should count lines as output by `grep -Eo`. I feel uneasy to drastically change this. Dnit13 are you listening? Want to fix this? – Jens Jul 26 '23 at 13:50
thanks @Jens I was just a newbie when I wrote this years ago :) – dnit13 Aug 21 '23 at 15:46

score 0 · Answer 3 · edited May 16 '16 at 19:54

0

You can use tr to preserve only alphanumeric characters by combining the the -c (complement) and -d (delete) flags. From there on, it's just a question of some piping:

$ cat myfile.txr | tr -cd [:alnum:] | wc -c

edited May 16 '16 at 19:54

Jens

69,818
15
125
179

answered May 15 '16 at 21:15

Mureinik

297,002
52
306
350

`cat myfile.txr | tr -cd [123456789] | wc -c` that example is correct? – K.Dote May 15 '16 at 21:42
Useless use of cat. And fails if there is a file named `m`. – Jens May 15 '16 at 21:44

score 0 · Answer 4 · answered May 15 '16 at 22:24

You can use sed to replace all characters that are not of the kind that you are looking for and then word count the characters of the result.

# 1h;1!H will place all lines into the buffer that way you can replace
# newline characters
sed -n '1h;1!H;${;g;s/[^a-zA-Z]//g;p;}' myfile | wc -c

It's easy enough to just do numbers as well.
sed -n '1h;1!H;${;g;s/[^0-9]//g;p;}' myfile | wc -c

Or why not both.
sed -n '1h;1!H;${;g;s/[^0-9a-zA-Z]//g;p;}' myfile | wc -c

score 0 · Answer 5 · answered May 15 '16 at 22:43

There are a number of ways to approach analyzing the line, word, and character frequency of a text file in bash. Utilizing the bash builtin character case filters (e.g. [:upper:], and so on), you can drill down to the frequency of each occurrence of each character type in a text file. Below is a simple script that reads from stdin and provides the normal wc output as it first line of output, and then outputs the number of upper, lower, digits, punct and whitespace.

#!/bin/bash

declare -i lines=0
declare -i words=0
declare -i chars=0
declare -i upper=0
declare -i lower=0
declare -i digit=0
declare -i punct=0

oifs="$IFS"

# Read line with new IFS, preserve whitespace
while IFS=$'\n' read -r line; do

    # parse line into words with original IFS
    IFS=$oifs
    set -- $line
    IFS=$'\n'

    # Add up lines, words, chars, upper, lower, digit
    lines=$((lines + 1))
    words=$((words + $#))
    chars=$((chars + ${#line} + 1))
    for ((i = 0; i < ${#line}; i++)); do
        [[ ${line:$((i)):1} =~ [[:upper:]] ]] && ((upper++))
        [[ ${line:$((i)):1} =~ [[:lower:]] ]] && ((lower++))
        [[ ${line:$((i)):1} =~ [[:digit:]] ]] && ((digit++))
        [[ ${line:$((i)):1} =~ [[:punct:]] ]] && ((punct++))
    done
done

echo " $lines $words $chars $file"
echo " upper: $upper,  lower: $lower,  digit: $digit,  punct: $punct,  \
whitespace: $((chars-upper-lower-digit-punct))"

Test Input

$ cat dat/captnjackn.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
(along with 2357 other pirates)

Example Use/Output

$ bash wcount3.sh <dat/captnjackn.txt
 5 21 108
 upper: 12,  lower: 68,  digit: 4,  punct: 3,  whitespace: 21

You can customize the script to give you as little or as much detail as you like. Let me know if you have any questions.

How to count the number of numbers/letters in file?

5 Answers5