Multiple values for a key in ksh

Question

I'm trying to read a file which is in pairs as follows:

V1#K1.@
V2#K1.@
V3#K2.@,V4#K1.@,V5#K2
V1#K3.@

My aim is to store it in key<=>pairs with # as a delimiter after removing '@' Value is placed before # and Keys are after # in the example file

The answer mentioned in associate multiple values for one key in array in bash couldn't be implemented. So i tried it in the following way in ksh:

#!/usr/bin/ksh

typeset -A arr

while IFS= read -r line;do
    STRIPPED=`echo $line|sed 's/.@//g'`
    OIFS="$IFS"
    IFS=','
    read -A TOKENS <<< "${STRIPPED}"
    IFS="$OIFS"

    for key in ${TOKENS[@]};do
        echo "Token is $key"    
        arr[${i##*#}]=${i%%#*}
        echo "Key: ${key##*#}, Value: ${arr[${key##*#}]}"
    done
done <MYFILE

# Printing key and its values
for i in ${!arr[@]};do
    echo "key: ${i}, value: ${arr[$i]}"
done

But this overwrites the previous values for a key. It doesnt consider multiple values for a key. Is there a way to do it in ksh(not bash)?

I'd suggest you update the question with a) some duplicate data, b) the output generated by your script and c) the desired output — markp-fuso, Jul 12 '19 at 12:01
I didn't have ksh for testing. I modified the cde in your link and found `declare -A array; while IFS='#' read -r value key; do array[$key]="${array[$key]}${array[$key]:+,}$value"; done < <(sed -r 's/.@,/\n/g;s/.@//' MYFILE` . Can you use this? — Walter A, Jul 12 '19 at 13:40

score 1 · Answer 1 · answered Jul 12 '19 at 10:51

I would do this, which stores multiple values as a comma-separated string

#!/usr/bin/env ksh

# The `exec` line tells ksh to read from MYFILE _if_ stdin has _not_ been redirected
# This allows you to do:
#    ./script.ksh
#    ./script.ksh < some_other_file
#    some_process | ./script.ksh

[[ -t 0 ]] && exec 0<MYFILE

typeset -A arr

while IFS= read -r line; do
    # greatly simplified tokenization
    IFS=',' read -rA tokens <<< "${line//.*/}"

    for t in "${tokens[@]}"; do
        key=${t%#*}
        val=${t#*#}
        [[ -n ${arr[$key]} ]] && arr[$key]+=,
        arr[$key]+=$val
    done
done

# Printing key and its values
for i in "${!arr[@]}"; do
    echo "key: ${i}, value: ${arr[$i]}"
done

which outputs

key: V1, value: K1,K3
key: V2, value: K1
key: V3, value: K2

But this way duplicate values are stored too. Is there any way to avoid duplicates? — anurag86, Jul 12 '19 at 11:45
Certainly. How would you check if a string contains a substring? — glenn jackman, Jul 12 '19 at 12:00

markp-fuso · Accepted Answer · 2019-07-14T03:19:35.230

Assumptions:

the input data is formatted exactly as displayed in the question (ie, no need to worry about other/extraneous text)
line 3 of the example input is missing a '.@' on the end of the 3rd attribute/value pair
to demonstrate duplicate processing I'll just copy the last input line a couple times
the question has no example of the desired output so I'll use glenn's example output
there is no explicit mention of any sorting preference (for the output) so I'll skip attempting to do any type of sorting at this point

Input file:

$ cat kdat
V1#K1.@
V2#K1.@
V3#K2.@,V4#K1.@,V5#K2.@
V1#K3.@
V1#K3.@
V1#K3.@

One solution based on sed and awk (both available in bash and ksh) where we use the attribute/value pair as the indices of a 2-dimensional array. By assigning an arbitrary value ('1' in this case) as the array value we can eliminate duplicate values.

the first time we see a (new) attribute/value pair we create the array element
the next time we see the (same) attribute/value pair we simply overwrite the array element
when we're done processing the input we find that each attribute/value pair is associated with a single array element (ie, there are no duplicates)

Now the actual code:

$ sed 's/,/\n/g;s/.@//g' kdat | awk -F"#" '
{ myarray[$1][$2]=1 }
END { for (i in myarray)
      { delim=""
        printf "key: %s, value: ",i
        for (j in myarray[i])
            { printf "%s%s",delim,j
              delim=","
            }
        printf "\n"
      }
    }
'

key: V1, value: K1,K3
key: V2, value: K1
key: V3, value: K2
key: V4, value: K1
key: V5, value: K2

Where:

sed ... : replace comma with a carriage return (each attribute/value pair is on a separate line; this awk solution assumes one attribute/value pair per line); remove '.@'
awk -F"#" ... : use '#' as the input delimiter for separating our attribute ($1) and value ($2) pairs
myarray[$1][$2]=1 : create/overwrite array($1,$2) with '1'; this is where duplicates are discarded
for / printf : loop through array indices, using printf to pretty print our output

A couple fiddles: ksh and bash

syntax error after running it @ `{ myarray[$1][$2]=1 }`, `END blocks must have an action part` and syntax error at `for (j in myarray[i])` — anurag86, Jul 14 '19 at 02:30
hmmm, if I pull the opening brace up onto the line with END it should work (I've edited the answer, and verified it works with the ksh and bash fiddles) — markp-fuso, Jul 14 '19 at 03:27
Thanks. Not sure why i keep getting the same error i mentioned previously with KSH version sh (AT&T Research) 93t+ 2010-06-21 however with KSH version sh (AT&T Research) 93u+ 2012-08-01 it works fine. — anurag86, Jul 14 '19 at 04:26
if it's any consolation the original code (`END` on a line by itself) worked ok in one of my `cygwin` environments ... go figure ... — markp-fuso, Jul 14 '19 at 13:06

Multiple values for a key in ksh

2 Answers2