3

Korean consists of word blocks (e.g., 가, 나, 다 라, etc.). I need a way to count these word blocks. For instance, the word 바다 (sea) should return 2. but

wc -w will return 1

wc -c will return 7

So these options won't work for me. I would appreciate your help.

Jose Ricardo Bustos M.
  • 8,016
  • 6
  • 40
  • 62
Eungi Kim
  • 67
  • 6

1 Answers1

5

바다 encoded as UTF-8 is 6 bytes long. If you want to count characters, use wc -m:

$ printf "바다" | wc -c
       6
$ printf "바다" | wc -m
       2
Blender
  • 289,723
  • 53
  • 439
  • 496