awk - Call external command and populate output before the first column

Question

I have a file that contains some information about daily storage utilization. There are two columns - DD.MM date and usage in KB for every day.

I'm using awk to show the difference between every second line and the previous one in GB as storage usage increases.

Example file:

20.09 10485760
21.09 20971520
22.09 26214400
23.09 27262976

My awk command:

awk 'NR > 1 {a=($2-prev)/1024^2" GB"} {prev=$2} {print $1,$2,a}' file

This outputs:

20.09 10485760
21.09 20971520 10 GB
22.09 26214400 5 GB
23.09 27262976 1 GB

I would also like to add the weekday name before the first column. The date format in the file is always DD.MM, so, to make GNU date accept it as a valid input and return the weekday name, i composed this pipeline:

echo '20.09.2022' | awk -v FS=. -v OFS=- '{print $3,$2,$1}' | date -f - +%a

It works, but i want to call it from the first awk for every processed line with the first column date as an argument and ".2022" appended to it in order to work, and put the output of this external pipeline (it will be the weekday name) before the date in first column.

Example output:

Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB

I looked at the system() option in awk, but i couldn't make it to work with my pipeline and my first awk command.

This might help: [How can we get weekday based on given date in unix](https://stackoverflow.com/q/46024251/3776858) — Cyrus, Sep 25 '22 at 06:14

RavinderSingh13 · Accepted Answer · 2022-09-25T14:44:34.367

1st solution: Using a getline within awk please try following solution.

awk '
NR>1{
  a=($2-prev)/1024^2" GB"
}
{
  split($1,arr,".")
  value="2022-"arr[2]"-"arr[1]
  dateVal="date -d \"" value "\" +%a"
  newVal = ( (dateVal | getline line) > 0 ? line : "N/A" )
  close(dateVal)
  print newVal,$0,a
  prev=$2
}
'   Input_file

2nd solution: With your shown samples please try following awk code. What system command does in awk is: It runs mentioned commands in a separate shell so basically you are calling awk-->system-->shell-->commands so in spite of that just get all the values with 1 awk for all days(based on 1st field of your Input_file) and we can pass it as an input to another awk where we are doing actual space calculations and we can merge both of them(because system command prints the output through shell commands so then we can't merge that output with awk's output). We could also do it with a while loop but IMHO doing it with awk could be faster.

awk '
FNR==NR{
  arr[FNR]=$0
  next
}
NR>1{
  a=($2-prev)/1024^2" GB"
}
{
  print arr[FNR],$1,$2,a
  prev=$2
}
' <(awk '{split($1,arr,".");system("d=\"2022-" arr[2]"-"arr[1]"\";date -d \"$d\" +%a")}' Input_file) Input_file

Output with shown samples will be as follows:

Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB

Daweo · Answer 2 · 2022-09-26T10:20:48.347

date can process multiple newline-sheared dates, therefore I propose following solution, let file.txt content be

20.09 10485760
21.09 20971520 10 GB
22.09 26214400 5 GB
23.09 27262976 1 GB

then

awk 'BEGIN{FS="[[:space:].]";OFS="-"}{print "2022",$2,$1}' file.txt | date -f - +%a | paste -d ' ' - file.txt

gives output

Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB

Explanation: I use GNU AWK to extract and prepare date for consumption by date, so 20.09 becomes 2022-09-20 and so on, then date is used to compute codename of day of week, then paste is used to get columns side by side sheared by space character, 1st column is - meaning use standard input, 2nd column is unchanged file.txt

(tested in GNU Awk 5.0.1 and paste (GNU coreutils) 8.30)

Ed Morton · Answer 3 · 2022-09-25T14:55:59.367

Since you have GNU date you should also have GNU awk which has builtin time functions that'll be orders of magnitude faster than awk spawning a subshell to call date for each input line:

$ cat tst.sh
#!/usr/bin/env bash

awk '
    BEGIN {
        year = strftime("%Y")
    }
    NR > 1 {
        diff = ( ($2 - prev) / (1024 ^ 2) ) " GB"
    }
    {
        split($1,dayMth,/[.]/)
        secs = mktime(year " " dayMth[2] " " dayMth[1] " 12 0 0")
        day = strftime("%a",secs)
        print day, $0, diff
        prev = $2
    }
' "${@:--}"

$ ./tst.sh file
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB

If for some reason you don't have GNU awk and can't get it then this 2-pass approach would work fairly efficiently using GNU date and any awk:

$ cat tst.sh
#!/usr/bin/env bash

awk -v year="$(date +'%Y')" -v OFS='-' '{
    split($1,dayMth,/[.]/)
    print year, dayMth[2], dayMth[1]
}' "$@" |
date -f- +'%a' |
awk '
    NR == FNR {
        days[NR] = $1
        next
    }
    FNR > 1 {
        diff = ( ($2 - prev) / (1024 ^ 2) ) " GB"
    }
    {
        print days[FNR], $0, diff
        prev = $2
    }
' - "$@"

$ ./tst.sh file
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB

The downside to that 2nd script is it couldn't read input from a stream, only from a file, since it has to read it twice. If that's an issue and your input isn't too massive to fit a copy on disk then you could always use a temp file, e.g.:

$ cat tst.sh
#!/usr/bin/env bash

tmp=$(mktemp)                   &&
trap 'rm -f "$tmp"; exit' 0     &&
cat "${@:--}" > "$tmp"          || exit 1

awk -v year="$(date +'%Y')" -v OFS='-' '{
    split($1,dayMth,/[.]/)
    print year, dayMth[2], dayMth[1]
}' "$tmp" |
date -f- +'%a' |
awk '
    NR == FNR {
        days[NR] = $1
        next
    }
    FNR > 1 {
        diff = ( ($2 - prev) / (1024 ^ 2) ) " GB"
    }
    {
        print days[FNR], $0, diff
        prev = $2
    }
' - "$tmp"

$ ./tst.sh file
Tue 20.09 10485760
Wed 21.09 20971520 10 GB
Thu 22.09 26214400 5 GB
Fri 23.09 27262976 1 GB

score 0 · Answer 4 · answered Sep 27 '22 at 15:39

who says you can't use system() to get the weekday ?

this function also comes with auto gnu-date vs. bsd-date detection,

(by way of gnu-date's ability to return up to nanoseconds precision, something that bsd-date lacks),

and adjusts its calling syntax accordingly

jot -w '2022-09-%d' 30 | gtail -n 12 | 

mawk 'function ____(_) {
return \
    substr("SunMonTueWedThuFriSat",(_=\
    system("exit \140 date -"     (\
    system("exit  \140date +\"%s%6N"\
    "\" |grep -cF N\140") ? "j -f "  \
         "\"%Y-%m-%d\"":"d") " \""(_) \
    "\" +%w \140")) +_+_+(_^=_<_),_+_+_)

} ($++NF=____($!_))^_'

2022-09-19 Mon
2022-09-20 Tue
2022-09-21 Wed
2022-09-22 Thu
2022-09-23 Fri
2022-09-24 Sat
2022-09-25 Sun

2022-09-26 Mon
2022-09-27 Tue
2022-09-28 Wed
2022-09-29 Thu
2022-09-30 Fri

system() typically can return you an unsigned integer from 0 to 255 if you explicitly set its exit code to be whatever value you desire,

so as long as the range of values needed is within 256 (or can be binned into it), then one can leverage system() and get the results quicker than a full getline routine.

But since this workaround requires numeric value returns, it wouldn't be able to directly just use the built-in formatting code date +'%a'.

awk - Call external command and populate output before the first column

4 Answers4