Bash, split words into letters and save to array

Question

I'm struggling with a project. I am supposed to write a bash script which will work like tr command. At the beginning I would like to save all commands arguments into separated arrays. And in case if an argument is a word I would like to have each char in separated array field,eg.

tr_mine AB DC

I would like to have two arrays: a[0] = A, a[1] = B and b[0]=C b[1]=D.

I found a way, but it's not working:

IFS="" read -r -a array <<< "$a"

Once you read a word from a list of words via `while read -r word; do ...;done < input` you can iterate over each character by `for ((i=0;i<${#word};i++)); do c=${word:i:1};done` — Rany Albeg Wein, Apr 03 '16 at 10:19
Maybe you should accept one answer: [How does accepting an answer work?](http://meta.stackexchange.com/q/5234/300807) — , Apr 04 '16 at 20:26

score 2 · Answer 1 · 2016-04-04T00:43:57.843

No sed, no awk, all bash internals.

Assuming that words are always separated with blanks (space and/or tabs),
also assuming that words are given as arguments, and writing for bash only:

#!/bin/bash

blank=$'[ \t]'
varname='A'

n=1
while IFS='' read -r -d '' -N 1 c ; do
    if [[ $c =~ $blank ]]; then n=$((n+1)); continue; fi
    eval ${varname}${n}'+=("'"$c"'")'
done <<<"$@"

last=$(eval echo \${#${varname}${n}[@]})        ### Find last character index.
unset "${varname}${n}[$last-1]"                 ### Remove last (trailing) newline.

for ((j=1;j<=$n;j++)); do
    k="A$j[@]"
    printf '<%s> ' "${!k}"; echo
done

That will set each array A1, A2, A3, etc. ... to the letters of each word.

The value at the end of the first loop of $n is the count of words processed. Printing may be a little tricky, that is why the code to access each letter is given above.

Applied to your sample text:

$ script.sh AB DC 
<A> <B>
<D> <C>

The script is setting two (array) vars A1 and A2.
And each letter is one array element: A1[0] = A, A1[1] = B and A2[0]=C, A2[1]=D.

You need to set a variable ($k) to the array element to access.
For example, to echo fourth letter (0 based) of second word (1 based) you need to do (that may be changed if needed):

k="A2[3]"; echo "${!k}"            ### Indirect addressing.

The script will work as this:

$ script.sh ABCD efghi
<A> <B> <C> <D> 
<e> <f> <g> <h> <i>

Caveat: Characters will be split even if quoted. However, quoted arguments is the correct way to use this script to avoid the effect of shell metacharacters ( |,&,;,(,),<,>,space,tab ). Of course, spaces (even if repeated) will split words as defined by the variable $blank:

$ script.sh $'qwer;rttt    fgf\ngfg'
<q> <w> <e> <r> <;> <r> <t> <t> <t> 
<> 
<> 
<> 
<f> <g> <f> <
> <g> <f> <g>

As the script will accept and correctly process embebed newlines we need to use: unset "${varname}${n}[$last-1]" to remove the last trailing "newline". If that is not desired, quote the line.

Security Note: The eval is not much of a problem here as it is only processing one character at a time. It would be difficult to create an attack based on just one character. Anyway, the usual warning is valid: Always sanitize your input before using this script. Also, most (not quoted) metacharacters of bash will break this script.

$ script.sh qwer(rttt    fgfgfg
bash: syntax error near unexpected token `('

Why have you not used `blank=$' \t'`? Do you really want the literal `'['` and `']'` included? — David C. Rankin, Apr 03 '16 at 23:33
@DavidC.Rankin Yes, that is a range of values, I do need the `[]` to capture **one** character if the value after the `=~` is the plain variable `$blank`. A possible alternative is to delay the use of `[]` to the actual test: `[[ $c =~ [$blank] ]]`. But I find that I really don't like it, it looks incorrect (even if functional) to me. — , Apr 03 '16 at 23:43
@DavidC.Rankin I just have this question that I can not erase from my mind: What have you found wrong with this answer that you didn't upvote it? Or is it that even not being wrong it is not "nice enough"? Sorry if I bother you. — , Apr 04 '16 at 21:02
No both, and a fair question. Upvotes are generally given for answers that are technically sound, non-duplicative, and show good effort, or explain the correct standard. The only reason I didn't immediately upvote was this topic was covered within the last few days (that's not your fault) However, this answer meets the rest of the criteria. Well done. Upvoted. — David C. Rankin, Apr 05 '16 at 00:35
@DavidC.Rankin Many thanks, for both actions, voting and taking the time to write an answer, but especially because of the explanation. — , Apr 05 '16 at 03:48

score 0 · Answer 2 · answered Apr 03 '16 at 09:11

I would strongly suggest to do this in another language if possible, it will be a lot easier.

Now, the closest I come up with is:

#!/bin/bash

sentence="AC DC"
words=`echo "$sentence" | tr " " "\n"`

# final array
declare -A result

# word count
wc=0

for i in $words; do
    # letter count in the word
    lc=0
    for l in `echo "$i" | grep -o .`; do
        result["w$wc-l$lc"]=$l
        lc=$(($lc+1))

    done
    wc=$(($wc+1))
done

rLen=${#result[@]}
echo "Result Length $rLen"


for i in "${!result[@]}"
do
  echo "$i => ${result[$i]}"
done

The above prints:

Result Length 4
w1-l1 => C
w1-l0 => D
w0-l0 => A
w0-l1 => C

Explanation:

Dynamic variables are not supported in bash (ie create variables using variables) so I am using an associative array instead (result)
Arrays in bash are single dimension. To fake a 2D array I use the indexes: w for words and l for letters. This will make further processing a pain...
Associative arrays are not ordered thus results appear in random order when printing
${!result[@]} is used instead of ${result[@]}. The first iterates keys while the second iterates values

I know this is not exactly what you ask for, but I hope it will point you to the right direction

SLePort · Answer 3 · 2016-04-03T14:00:48.777

Try this :

sentence="$@"
read -r -a words <<< "$sentence"
for word in ${words[@]}; do
    inc=$(( i++ ))
    read -r -a l${inc} <<< $(sed 's/./& /g' <<< $word)
done

echo ${words[1]} # print "CD"
echo ${l1[1]} # print "D"

The first read reads all words, the internal one is for letters.

The sed command add a space after each letters to make the string splittable by read -a. You can also use this sed command to remove unwanted characters from words (eg commas) before splitting.

If special characters are allowed in words, you can use a simple grep instead of the sed command (as suggested in http://www.unixcl.com/2009/07/split-string-to-characters-in-bash.html) :

read -r -a l${inc} <<< $(grep -o . <<< $word)

The word array is ${w}.

The letters arrays are named l# where # is an increment added for each word read.

Bash, split words into letters and save to array

3 Answers3