0

I have an array of unicode points that i want to be able to convert back into characters and store it inside a variable as a string. In the example below it's just the "Hello World!" code point array but I could have any unicode number (up to 16 bits).

array=( 72 101 108 108 111 32 87 111 114 108 100 33 )

I checked:

and other online resources but I still can't figure out how to do this. I tried things like:

temp=
for c in ${array[@]}; do
        temp+="\U$c"
done
printf %b "$temp"

I also saw bash has a new feature that allows you to do either echo -e '\Uxxxxx' or $'\Uxxx' but in my case it doesn't work since even if i iterate over the array and store each code point inside a variable i, the single quotes would prevent bash from expanding it in this case: echo $'\U$i', i even tried echo "$'\U$i'".

I'm utterly clueless on how to do this with pure bash in a simple way..

Lorenzo
  • 591
  • 7
  • 18

3 Answers3

4

The thing that's messing you up is that your array is full of the decimal numbers of the codepoints, but the \U notation takes hexidecimal numbers. So for example, the first element in the array is "72" -- in decimal, that's the code for "H", but in hex it's equivalent to decimal 114, which is the code for "r".

So to use \U notation, you first need to convert the numbers to hex, which you can do with printf %x:

for c in "${array[@]}"; do
    temp+="\\U$(printf %x "$c")"    # Convert dec->hex, add \U
done
printf %b "$temp"    # Convert \U<codepoint> to actual characters

As dave_thompson_085 pointed out in a comment, you can simplify this even further by converting the entire array with a single printf:

printf %b "$(printf '\\U%x' "${array[@]}")"
Gordon Davisson
  • 118,432
  • 16
  • 123
  • 151
  • 2
    And instead of looping you can give the first `printf` multiple arguments: `printf %b $( printf '\\U%x' "${array[@]}" )` _or_ `echo -e $( same )` – dave_thompson_085 Jul 05 '22 at 04:37
  • @dave_thompson_085 Nice! I'll add the concentric `printf` option, but I don't trust `echo -e` -- I had a bunch of scripts break once because an OS update included bash compiled with different options, and `echo`'s behavior with options changed. – Gordon Davisson Jul 05 '22 at 05:07
1

Shell scripts aren't do-it-all. For complex actions, they often rely on other utility programs that are common in linux installations. In this case, iconv can help.

array=( 72 101 108 108 111 32 87 111 114 108 100 33 )
temp=
for c in ${array[@]}; do temp+=$(printf '\\x%x' $c); done
temp=$(echo -ne $temp | iconv -f utf8)
printf %b "$temp"
Ouroborus
  • 16,237
  • 4
  • 39
  • 62
  • This prints `爐ㄐ㠐㠑ㄲ蜑ㄑ㐐㠐〳`, it should print `Hello World!` even though i need it to support unicode, the unicode value of asci characters should be the same – Lorenzo Jul 05 '22 at 03:54
  • I dont mind the use of any utility program though, as long as it comes built-in with most distros – Lorenzo Jul 05 '22 at 04:02
  • @Lorenze Note that you'll likely need to test for code points with larger values (like chinese or such). My answer assumes the array is a list of integer bytes. Gordon Davisson's answer assumes the array is a list of integer code points. – Ouroborus Jul 05 '22 at 04:16
  • My bad, i just got home let me test and i'll update the selected answer accordingly – Lorenzo Jul 05 '22 at 05:32
0

Why are you calling the array array and then continue with string ??

tmp=""
arr=( 72 101 108 108 111 32 87 111 114 108 100 33 )
for c in "${arr[@]}"; do tmp+="\U$c"; done
printf %b "$tmp"
Martin Zeitler
  • 1
  • 19
  • 155
  • 216
  • About the `array` and `string` names it's a mistake my bad, since i wanted to simplify the example, in my code it's stupidly called `string` even though it's actually an array. However this doesn't work, it prints `rāĈĈđ2đĔĈĀ3` instead of `Hello World!` – Lorenzo Jul 05 '22 at 03:51