How to split a camelCase string into an array in awk?

Question

How can I split a camelCase string into an array in awk using the split function?

Input:

STRING="camelCasedExample"

Desired Result:

WORDS[1]="camel"
WORDS[2]="Cased"
WORDS[3]="Example"

Bad Attempt:

split(STRING, WORDS, /([a-z])([A-Z])/);

Bad Result:

WORDS[1]="came"
WORDS[2]="ase"
WORDS[3]="xample"

score 5 · Accepted Answer · answered Aug 03 '22 at 18:04

5

You can't do it with split() alone which is why GNU awk has patsplit():

$ awk 'BEGIN {
    patsplit("camelCasedExample",words,/(^|[[:upper:]])[[:lower:]]+/)
    for ( i in words ) print words[i]
}'
camel
Cased
Example

answered Aug 03 '22 at 18:04

Ed Morton

188,023
17
78
185

2

This is what I was looking for! I was not aware of the patsplit command and it was not supported by gawk 3.1.8 on my older dev server, but I was able to make use of it with my newer server which has gawk 4.2.1. – Raven Aug 03 '22 at 18:30

RavinderSingh13 · Answer 2 · 2022-08-03T18:14:48.130

With your shown samples, please try following. Written and tested in GNU awk should work in any awk. This will create array named words whose values could be accessed from index starting 1,2,3 and so on. I am printing it as an output, you can make use of it later on as per your wish too.

awk -F'=|"' -v s1="\"" '
{
  gsub(/[A-Z]/,"\n&",$3)
  val=(val?val ORS:"")$3
}
END{
  num=split(val,words,ORS)
  for(i=1;i<=num;i++){
    if(words[i]!=""){
      print "WORDS[" ++count "]=" s1 words[i] s1
    }
  }
}
' Input_file

Explanation: Adding detailed explanation for above awk code.

awk -F'=|"' -v s1="\"" '                     ##Starting awk program, setting field separator as = OR " and setting s1 to " here.
{
  gsub(/[A-Z]/,"\n&",$3)                     ##Using gsub to globally substitute captial letter with new character and value itself in 3rd field.
  val=(val?val ORS:"") $3                    ##Creating val which has $3 in it and keep adding values in val itself.
}
END{                                         ##Starting END block of this program from here.
  num=split(val,words,ORS)                     ##Splitting val into array arr with delmiter of ORS.
  for(i=1;i<=num;i++){                       ##Running for loop from value of 1 to till num here.
    if(words[i]!=""){                          ##Checking if arr item is NOT NULL then do following.
       print "WORDS[" ++count "]=" s1 words[i] s1    ##Printing WORDS[ value of i followed by ]= followed by s1 words[i] value and s1.
    }
  }
}
'  Input_file                                ##Mentioning Input_file name here.

score 2 · Answer 3 · answered Aug 03 '22 at 18:28

Here is an awk solution that would work with any version of awk:

s='camelCasedExample'
awk '{
   while (match($0, /(^|[[:upper:]])[[:lower:]]+/)) {
      wrd = substr($0,RSTART,RLENGTH)
      print wrd
      # you can also store it in array
      arr[++n] = wrd
      $0 = substr($0,RSTART+RLENGTH)
   }
}' <<< "$s"

camel
Cased
Example

score 0 · Answer 4 · answered Aug 05 '22 at 10:48

echo 'camelCasedExample' | 

mawk '{ for (_=(____=split($((_=_<_) * gsub("[>-[]",
            (___)"&")), __, ___) )^_; _<=____; _++) {

        print "","__["(_)"]",__[_] } }' OFS=' :: ' FS='^$' ___='\20\22'

 :: __[1] :: camel
 :: __[2] :: Cased
 :: __[3] :: Example

How to split a camelCase string into an array in awk?

4 Answers4