5
wc -c

appears to only do a dumb bytecount, not interpret actual characters with regard for encoding.

How can I get the actual character count?

Jim Lewis
  • 43,505
  • 7
  • 82
  • 96
user2958725
  • 1,355
  • 3
  • 12
  • 16
  • I edited your question to clarify that you're looking for character counts, not the byte counts `wc -c` gives you. Feel free to roll back the edit if that's not what you meant... – Jim Lewis Nov 08 '13 at 06:42

2 Answers2

13

Use -m or --chars option.

For example (text file contains two Korean characters and newline):

falsetru@jmlee12:~$ cat text
안녕
falsetru@jmlee12:~$ wc -c text
7 text
falsetru@jmlee12:~$ wc -m text
3 text

According to wc(1):

   -c, --bytes
          print the byte counts

   -m, --chars
          print the character counts
falsetru
  • 357,413
  • 63
  • 732
  • 636
1

Don't confuse chars, chars and bytes. A byte is 8 bits long, and -c counts bytes in your file whatever you put in. A char in many programming languages is also 8 bits long this is why counting bytes uses -c! If you want to count how many characters (chars) of a given alphabet you have in a file, then you need to specify in some way which encoding of chars have been used, and sometimes, that encoding uses more than a byte for a char. Read the manual for wc, it will tell you that -m will use you current locale (roughly your language/charset preferences) to decode the file and count your chars.

Jean-Baptiste Yunès
  • 34,548
  • 4
  • 48
  • 69