1

I'm trying to read a file which is in pairs as follows:

V1#K1.@
V2#K1.@
V3#K2.@,V4#K1.@,V5#K2
V1#K3.@

My aim is to store it in key<=>pairs with # as a delimiter after removing '@' Value is placed before # and Keys are after # in the example file

The answer mentioned in associate multiple values for one key in array in bash couldn't be implemented. So i tried it in the following way in ksh:

#!/usr/bin/ksh

typeset -A arr

while IFS= read -r line;do
    STRIPPED=`echo $line|sed 's/.@//g'`
    OIFS="$IFS"
    IFS=','
    read -A TOKENS <<< "${STRIPPED}"
    IFS="$OIFS"

    for key in ${TOKENS[@]};do
        echo "Token is $key"    
        arr[${i##*#}]=${i%%#*}
        echo "Key: ${key##*#}, Value: ${arr[${key##*#}]}"
    done
done <MYFILE

# Printing key and its values
for i in ${!arr[@]};do
    echo "key: ${i}, value: ${arr[$i]}"
done

But this overwrites the previous values for a key. It doesnt consider multiple values for a key. Is there a way to do it in ksh(not bash)?

anurag86
  • 1,635
  • 1
  • 16
  • 31
  • I'd suggest you update the question with a) some duplicate data, b) the output generated by your script and c) the desired output – markp-fuso Jul 12 '19 at 12:01
  • I didn't have ksh for testing. I modified the cde in your link and found `declare -A array; while IFS='#' read -r value key; do array[$key]="${array[$key]}${array[$key]:+,}$value"; done < <(sed -r 's/.@,/\n/g;s/.@//' MYFILE` . Can you use this? – Walter A Jul 12 '19 at 13:40

2 Answers2

1

I would do this, which stores multiple values as a comma-separated string

#!/usr/bin/env ksh

# The `exec` line tells ksh to read from MYFILE _if_ stdin has _not_ been redirected
# This allows you to do:
#    ./script.ksh
#    ./script.ksh < some_other_file
#    some_process | ./script.ksh

[[ -t 0 ]] && exec 0<MYFILE

typeset -A arr

while IFS= read -r line; do
    # greatly simplified tokenization
    IFS=',' read -rA tokens <<< "${line//.*/}"

    for t in "${tokens[@]}"; do
        key=${t%#*}
        val=${t#*#}
        [[ -n ${arr[$key]} ]] && arr[$key]+=,
        arr[$key]+=$val
    done
done

# Printing key and its values
for i in "${!arr[@]}"; do
    echo "key: ${i}, value: ${arr[$i]}"
done

which outputs

key: V1, value: K1,K3
key: V2, value: K1
key: V3, value: K2
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
1

Assumptions:

  • the input data is formatted exactly as displayed in the question (ie, no need to worry about other/extraneous text)
  • line 3 of the example input is missing a '.@' on the end of the 3rd attribute/value pair
  • to demonstrate duplicate processing I'll just copy the last input line a couple times
  • the question has no example of the desired output so I'll use glenn's example output
  • there is no explicit mention of any sorting preference (for the output) so I'll skip attempting to do any type of sorting at this point

Input file:

$ cat kdat
V1#K1.@
V2#K1.@
V3#K2.@,V4#K1.@,V5#K2.@
V1#K3.@
V1#K3.@
V1#K3.@

One solution based on sed and awk (both available in bash and ksh) where we use the attribute/value pair as the indices of a 2-dimensional array. By assigning an arbitrary value ('1' in this case) as the array value we can eliminate duplicate values.

  • the first time we see a (new) attribute/value pair we create the array element
  • the next time we see the (same) attribute/value pair we simply overwrite the array element
  • when we're done processing the input we find that each attribute/value pair is associated with a single array element (ie, there are no duplicates)

Now the actual code:

$ sed 's/,/\n/g;s/.@//g' kdat | awk -F"#" '
{ myarray[$1][$2]=1 }
END { for (i in myarray)
      { delim=""
        printf "key: %s, value: ",i
        for (j in myarray[i])
            { printf "%s%s",delim,j
              delim=","
            }
        printf "\n"
      }
    }
'

key: V1, value: K1,K3
key: V2, value: K1
key: V3, value: K2
key: V4, value: K1
key: V5, value: K2

Where:

  • sed ... : replace comma with a carriage return (each attribute/value pair is on a separate line; this awk solution assumes one attribute/value pair per line); remove '.@'
  • awk -F"#" ... : use '#' as the input delimiter for separating our attribute ($1) and value ($2) pairs
  • myarray[$1][$2]=1 : create/overwrite array($1,$2) with '1'; this is where duplicates are discarded
  • for / printf : loop through array indices, using printf to pretty print our output

A couple fiddles: ksh and bash

markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • syntax error after running it @ `{ myarray[$1][$2]=1 }`, `END blocks must have an action part` and syntax error at `for (j in myarray[i])` – anurag86 Jul 14 '19 at 02:30
  • hmmm, if I pull the opening brace up onto the line with END it should work (I've edited the answer, and verified it works with the ksh and bash fiddles) – markp-fuso Jul 14 '19 at 03:27
  • Thanks. Not sure why i keep getting the same error i mentioned previously with KSH version sh (AT&T Research) 93t+ 2010-06-21 however with KSH version sh (AT&T Research) 93u+ 2012-08-01 it works fine. – anurag86 Jul 14 '19 at 04:26
  • if it's any consolation the original code (`END` on a line by itself) worked ok in one of my `cygwin` environments ... go figure ... – markp-fuso Jul 14 '19 at 13:06