0

Suppose I have the following (newline-separated) array in Bash (4.3):

abc  def  ghi  jkl        # ${myArray[0]}
abc  def  ghi  jkl  mno   # ${myArray[1]}
abc  def  ghi             # ${myArray[2]}
...

I would like to split this up columnwise, into the following array variables:

# First column
abc  # ${myArray_col1[0]}
abc  # ${myArray_col1[1]}
abc  # ${myArray_col1[2]}
...

# Second column
def  # ${myArray_col2[0]}
def  # ${myArray_col2[1]}
def  # ${myArray_col2[2]}
...


# Third column, and all following
ghi  jkl        # ${myArray_col3_and_onward[0]}
ghi  jkl  mno   # ${myArray_col3_and_onward[1]}
ghi             # ${myArray_col3_and_onward[2]}
...

I know many ways in which this can be done (awk, read -a, etc.), and most involve a loop of some sort.

However, the simplest, shortest and fastest way I've been able to come up with, is this:

myArray_col1=($(echo ${myArray[*]} | cut -f 1 -d ' '))
myArray_col2=($(echo ${myArray[*]} | cut -f 2 -d ' '))
myArray_col3_and_onward=($(echo ${myArray[*]} | cut -f 4- -d ' '))

(with $IFS set to newline of course).

Although this suits my needs perfectly fine, there is this part of me that's just plain annoyed by the seemingly unnecessary triple call to echo and cut :)

Is there any way that this can be avoided? I'm looking for something like:

(MAGIC SYNTAX HERE)=($(echo ${myArray[*]} | cut -f 1,2,4- -d ' '))

NOTE: the way in which I've obtained the original array is already the most convenient/fastest in many senses (and moreover, it's out of my direct control :)), so I'd rather not touch that.

Rody Oldenhuis
  • 37,726
  • 7
  • 50
  • 96

1 Answers1

2

bash doesn't really have this type of transposition built-in, but the following loop should be sufficient (with no external processes needed).

for row in "${myArray[@]}"; do
    read col1 col2 col3plus <<< "$row"
    myArray_col1+=( "$col1" )
    myArray_col2+=( "$col2" )
    myArray_col3_and_onward+=( "$col3plus" )
done
chepner
  • 497,756
  • 71
  • 530
  • 681
  • +1, although it's not quite the elegance I was hoping for...It can actually be quite a bit slower than repeated `cuts` for large input arrays; I suspect that's because this loop requires dynamic array resizing, whereas the `cut` knows beforehand how large the output array will be. Plus, note that my `$IFS` is set to newline, to the `read` doesn't split correctly unless setting it to space. – Rody Oldenhuis May 27 '14 at 14:36
  • You can locally override `IFS` in `IFS=$'\n' read col1 col2 col3plus <<< "$row"`. `bash` really isn't suited for this type of use; it's a glue language for tying together other programs. Once you reach a certain level of data processing, you're better off switching to a more general-purpose programming language. – chepner May 27 '14 at 15:15
  • One is not always free to choose the language. More importantly, believe it or not, I'm not processing data, I'm gluing :) – Rody Oldenhuis May 27 '14 at 15:27