5

cat doc.txt and the following characters will show:

你好 Hello!
这是中文。This is a Chinese doc.

I can use the command

wc -w doc.txt

but it will show:

8 doc.txt

this command take characters 你好 and 这是中文 both as a single word, while in fact 你好 are two Chinese words and 这是中文 four.

What I want is to get these Chinese words counting right(there are 12 words in the example), could anyone help out?

Arron Cao
  • 416
  • 2
  • 9
  • try adding `LANG=?chinese? wc -c file` (not sure of the proper value to use after LANG, you should be able to find it without much searching. Also, if you're using a hertiage Unix (AIX, HP, less so Solaris), don't count on this working regardless what you do. Maybe the latest Linux with up-to-date `wc`. Good luck. – shellter Jul 22 '15 at 14:10

1 Answers1

5

You can use -m or --chars option:

$ echo -n "你好" | wc -m  

Output:

2
Ren
  • 2,852
  • 2
  • 23
  • 45