3

I try to count the number of numbers and letters in my file in Bash. I know that I can use wc -c file to count the number of characters but how can I fix it to only letters and secondly numbers?

Mureinik
  • 297,002
  • 52
  • 306
  • 350
K.Dote
  • 31
  • 1
  • 3

5 Answers5

2

Here's a way completely avoiding pipes, just using tr and the shell's way to give the length of a variable with ${#variable}:

$ cat file
123 sdf
231 (3)
huh? 564
242 wr =!
$ NUMBERS=$(tr -dc '[:digit:]' < file)
$ LETTERS=$(tr -dc '[:alpha:]' < file)
$ ALNUM=$(tr -dc '[:alnum:]' < file)
$ echo ${#NUMBERS} ${#LETTERS} ${#ALNUM}
13 8 21
Jens
  • 69,818
  • 15
  • 125
  • 179
1

To count the number of letters and numbers you can combine grep with wc:

 grep -Eo '[a-z]' myfile | wc -w
 grep -Eo '[0-9]' myfile | wc -w

With little bit of tweaking you can modify it to count numbers or alphabetic words or alphanumeric words like this,

grep -Eo '[a-z]+' myfile | wc -w
grep -Eo '[0-9]+' myfile | wc -w
grep -Eo '[[:alnum:]]+' myfile | wc -w
dnit13
  • 2,478
  • 18
  • 35
  • Terminal is showing incorrect output from first and second example, hmm? – K.Dote May 15 '16 at 21:46
  • This counts all characters for *any line with at least* one alphabetic or numeric character. – Jens May 15 '16 at 21:50
  • Make it `grep -o` for couting `mixed line 111`. – Walter A May 15 '16 at 21:53
  • The greps mentioned first fail when you have a file named `a` or `7` in the current directory. Always quote shell meta-characters! – Jens Feb 03 '21 at 20:08
  • @Jens, I don't get you. I've created file `a` in a folder still both lines seem to work (I'm quoting file name). – Alex Martian Jul 25 '23 at 02:13
  • 1
    @AlexMartian The answer was edited and a plus was appended to the grep regex. Now if there is a file named `a+` an unquoted `[a-z]+` would expand to `a+`. You can test this with `touch a+; echo [a-z]+; echo '[a-z]+'`. Do you understand now why you must quote regexen? – Jens Jul 25 '23 at 18:07
  • @Jens, now I see, thank you. Interestingly `echo 'aaa abcd 123' | grep -o '[a-z]+'` produces empty with my copy of "grep (GNU grep) 3.7" though man page has info about `+` (w/out `+` works). – Alex Martian Jul 26 '23 at 04:42
  • @AlexMartian That's because + is not part of "Basic Regular Expressions (BRE)", only part of "Extended Regular Expressions (ERE)". To enable ERE, use the -E option: `echo 'aaa abcd 123' | grep -Eo '[a-z]+'` and it will output two lines with aaa and abcd. – Jens Jul 26 '23 at 13:31
  • @Jens, worth editing the answer or my grep is rather peculiar? – Alex Martian Jul 26 '23 at 13:39
  • 1
    @AlexMartian Hm. It looks like the dnit13's answer has deteriorated quite a bit when the + was added, because `wc -c` counts characters, not words as the text suggests, when it should count lines as output by `grep -Eo`. I feel uneasy to drastically change this. Dnit13 are you listening? Want to fix this? – Jens Jul 26 '23 at 13:50
  • thanks @Jens I was just a newbie when I wrote this years ago :) – dnit13 Aug 21 '23 at 15:46
0

You can use tr to preserve only alphanumeric characters by combining the the -c (complement) and -d (delete) flags. From there on, it's just a question of some piping:

$ cat myfile.txr | tr -cd [:alnum:] | wc -c
Jens
  • 69,818
  • 15
  • 125
  • 179
Mureinik
  • 297,002
  • 52
  • 306
  • 350
0

You can use sed to replace all characters that are not of the kind that you are looking for and then word count the characters of the result.

# 1h;1!H will place all lines into the buffer that way you can replace
# newline characters
sed -n '1h;1!H;${;g;s/[^a-zA-Z]//g;p;}' myfile | wc -c

It's easy enough to just do numbers as well.
sed -n '1h;1!H;${;g;s/[^0-9]//g;p;}' myfile | wc -c

Or why not both.
sed -n '1h;1!H;${;g;s/[^0-9a-zA-Z]//g;p;}' myfile | wc -c
Saqib Rokadia
  • 629
  • 7
  • 16
0

There are a number of ways to approach analyzing the line, word, and character frequency of a text file in bash. Utilizing the bash builtin character case filters (e.g. [:upper:], and so on), you can drill down to the frequency of each occurrence of each character type in a text file. Below is a simple script that reads from stdin and provides the normal wc output as it first line of output, and then outputs the number of upper, lower, digits, punct and whitespace.

#!/bin/bash

declare -i lines=0
declare -i words=0
declare -i chars=0
declare -i upper=0
declare -i lower=0
declare -i digit=0
declare -i punct=0

oifs="$IFS"

# Read line with new IFS, preserve whitespace
while IFS=$'\n' read -r line; do

    # parse line into words with original IFS
    IFS=$oifs
    set -- $line
    IFS=$'\n'

    # Add up lines, words, chars, upper, lower, digit
    lines=$((lines + 1))
    words=$((words + $#))
    chars=$((chars + ${#line} + 1))
    for ((i = 0; i < ${#line}; i++)); do
        [[ ${line:$((i)):1} =~ [[:upper:]] ]] && ((upper++))
        [[ ${line:$((i)):1} =~ [[:lower:]] ]] && ((lower++))
        [[ ${line:$((i)):1} =~ [[:digit:]] ]] && ((digit++))
        [[ ${line:$((i)):1} =~ [[:punct:]] ]] && ((punct++))
    done
done

echo " $lines $words $chars $file"
echo " upper: $upper,  lower: $lower,  digit: $digit,  punct: $punct,  \
whitespace: $((chars-upper-lower-digit-punct))"

Test Input

$ cat dat/captnjackn.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
(along with 2357 other pirates)

Example Use/Output

$ bash wcount3.sh <dat/captnjackn.txt
 5 21 108
 upper: 12,  lower: 68,  digit: 4,  punct: 3,  whitespace: 21

You can customize the script to give you as little or as much detail as you like. Let me know if you have any questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85