2

I'm trying to count the number of elements/words present in each field of a big table. Fields are delimited by whitspaces, and field elements ("words") by commas. The table also contains empty fields (e.g. two or more consecutive whitespaces), which is equivalent to 0 elements.

For example, from a table such as this:

val1 this,is,text this,more,text  stop
val2  this,is a field
val3    end,text

This would be the desired output:

val1 3 3 0 1
val2 0 2 1 1
val3 0 0 0 2

(I'd like to keep the first column as is)

Please note that there are two blank spaces before the stop value in the first line, indicating that the fourth field has 0 elements. Similar things happen in other lines.

I've been using the split function of awk to create an array with the desired number of elements for each field:

awk '{ for(i = 2; i <= NF; i++) {
$i=split($i,a,",") ; { if (!$i) { $i="0" }};
}; print $0}' input

I'm splitting each field i into an array a of n elements, and assigning this value to the variable $i. In the case of 0 elements in the given field, (!$i), $i=0.

But this is my current, unwanted output:

val1 3 3 1
val2 2 1 1
val3 2

As you can see, 0 values are omitted. I think that there's some issue with the assignment of the 0 value to empty fields.

Can anyone help me? Thanks a lot in advance!

xgrau
  • 299
  • 1
  • 2
  • 11

2 Answers2

3

You have to specify that you want a single white space as field separator:

awk -F"[ ]" '{str=$1 
              for(i=2; i<=NF; i++){str=str" "split($i, arr, ",")}
              print str}' test1

In this case, the output is:

val1 3 3 0 1
val2 0 2 1 1
val3 0 0 0 2
F. Knorr
  • 3,045
  • 15
  • 22
3

essentially the same solution with printf

$ awk -F'[ ]' '{printf "%s ", $1; 
                for(i=2;i<=NF;i++) printf "%s ", split($i,a,","); 
                print ""}' file

val1 3 3 0 1
val2 0 2 1 1
val3 0 0 0 2
karakfa
  • 66,216
  • 7
  • 41
  • 56