Unwanted blank row in CSV file

Question

I have a command here that i was able to put together with the help of Stackoverflow community. Now I have a small concern with the script, this is a very small issue but its bugging me.

Below is the script I use:

#!/bin/bash

  read -p "Enter /DIR/PATH/FILENAME where you wish to copy the data: " FILENAME
  echo "Enter the JOB_NAME or %SEARCHSTRING%"


 while read -r i;
   do

  awk '
    BEGIN {
    print "\"insert_job\",\"job_type\",\"box_name\",\"command\",\"machine\",\"owner\",\"date_conditions\",\"condition\",\"run_calendar\",\"exclude_calendar\",\"days_of_week\",\"run_window\",\"start_times\",\"start_mins\",\"resources\",\"profile\",\"term_run_time\",\"watch_file\",\"watch_interval\"" }

/job_type/ {
    if (NR>1){printf "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n", jn, jt, box, cmd, mcn, own, dc, c, rc, ec, dow, ruw, st, sm, res, prof, trt, wf, wi} jn="\""$2"\""; jt="\""$4"\""; box="\" \""; cmd="\" \""; mcn="\" \""; own="\" \""; dc="\" \""; c="\" \""; rc="\" \""; ec="\" \""; dow="\" \""; ruw="\" \""; st="\" \""; sm="\" \""; res="\" \""; prof="\" \""; trt="\" \""; wf="\" \""; wi="\" \""}
    /box_name/ {box="\""$2"\""}
    /command/ {$0=substr($0,index($0,$2)); cmd="\""$0"\""}
    /machine/ {mcn="\""$2"\""}
    /owner/   {own="\""$2"\""}
    /date_conditions/ {dc="\""$2"\""}
    /condition/ {$0=substr($0,index($0,$2)); c="\""$0"\""}
    /run_calendar/ {rc="\""$2"\""}
    /exclude_calendar/ {ec="\""$2"\""}
    /days_of_week/ {dow="\""$2"\""}
    /run_window/ {ruw="\""$2"\""}
    /start_times/ {gsub("\"",""); st="\""$2"\""}
    /^start_mins/ {sm="\""$2"\""}
    /profile/ {prof="\""$2"\""}
    /term_run_time/ {trt="\""$2"\""}
    /watch_file/ {wf="\""$2"\""}
    /watch_interval/ {wi="\""$2"\""}
    /resources/ {res="\""$2"\""}
    END{printf "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n", jn, jt, box, cmd, mcn, own, dc, c, rc, ec, dow, ruw, st, sm, res, prof, trt, wf, wi}
' < <(autorep -j $i -q) > $FILENAME.csv

break
done

The $i takes a wildcard entry and gives output asper that.

for eg: the below 4 jobs have Test in their name, so i will give Test% as the wildcard value and the script will give the output for all the 4 jobs.

These are the test jobs i am using:

/* ----------------- Test_A ----------------- */

insert_job: Test_A  job_type: CMD
command: sleep 3000
machine: machine1
owner: user1
permission:
date_conditions: 0
term_run_time: 1
alarm_if_fail: 1
alarm_if_terminated: 1


/* ----------------- Test_B ----------------- */

insert_job: Test_B    job_type: CMD
command: echo
machine: machine2
owner: user2
permission:
date_conditions: 0
description: "Test"
std_out_file: "/tmp/$AUTO_JOB_NAME.$AUTORUN.out"
std_err_file: "/tmp/$AUTO_JOB_NAME.$AUTORUN.err"
max_run_alarm: 1
alarm_if_fail: 0
alarm_if_terminated: 0
send_notification: 1



/* ----------------- Test_c ----------------- */

insert_job: Test_c   job_type: CMD
command: sleep 10
machine: machine3
owner: user3
permission:
date_conditions: 0
alarm_if_fail: 0
alarm_if_terminated: 0


/* ----------------- Test_d ----------------- */

insert_job: Test_d   job_type: CMD
command: ls
machine: machine4
owner: user4
permission:
date_conditions: 0
alarm_if_fail: 1
alarm_if_terminated: 1

But here is the issue, the csv file output has a blank row between the column names and the data like below:

"insert_job","job_type","box_name","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","resources","profile","term_run_time","watch_file","watch_interval"
,,,,,,,,,,,,,,,,,,
"Test_A","CMD"," ","sleep 3000","machine1","user1","0","0"," "," "," "," "," "," "," "," ","1"," "," "
"Test_B","CMD"," ","echo","machine2","user2","0","0"," "," "," "," "," "," "," "," "," "," "," "
"Test_c","CMD"," ","sleep 10","machine3","user3","0","0"," "," "," "," "," "," "," "," "," "," "," "
"Test_d","CMD"," ","ls","machine4","user4","0","0"," "," "," "," "," "," "," "," "," "," "," "

Required output is:

"insert_job","job_type","box_name","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","resources","profile","term_run_time","watch_file","watch_interval"
"Test_A","CMD"," ","sleep 3000","machine1","user1","0","0"," "," "," "," "," "," "," "," ","1"," "," "
"Test_B","CMD"," ","echo","machine2","user2","0","0"," "," "," "," "," "," "," "," "," "," "," "
"Test_c","CMD"," ","sleep 10","machine3","user3","0","0"," "," "," "," "," "," "," "," "," "," "," "
"Test_d","CMD"," ","ls","machine4","user4","0","0"," "," "," "," "," "," "," "," "," "," "," "

I have tried using (NR>=1) but it doesn't work. I know this is very trivial but I cant get my head around it, can someone help me?

have you looked if this answer helps your issue: https://stackoverflow.com/questions/21489237/remove-blank-rows-from-csv-files-in-php — Zyfella, Jan 25 '23 at 13:35
The `if (NR>1)` may be where your problem is. The first time that statement is true all of your variables for the following `printf` will be empty so the empty csv record is the output. Changing the check to `if (NR>3)`, should avoid outputting this empty csv record. — j_b, Jan 25 '23 at 14:58
@j_b `if (NR>1)` was your code that i took from one of my previous post https://stackoverflow.com/questions/74808194/how-to-use-awk-command-output-and-turn-into-csv-file — NecroCoder, Jan 25 '23 at 15:49
Don't add a followup question in THIS question, put this one back as it was when you got answers and then post a new question. [Chameleon questions](https://meta.stackexchange.com/questions/43478/exit-strategies-for-chameleon-questions) are strongly discouraged. — Ed Morton, Jan 25 '23 at 19:47

Ed Morton · Accepted Answer · 2023-01-25T15:28:37.270

Here's how to robustly (your existing script will fail for various possible input values) and maintainably (if you want to add/remove fields to print or reorder them just update the first split() arg) do what you're trying to do, using cat file inplace of autorep -j $i -q which I don't have on my system:

$ cat tst.sh
#!/usr/bin/env bash

cat file |
awk '
    BEGIN {
        OFS = ","
        numTags = split("insert_job job_type box_name command machine owner date_conditions condition run_calendar exclude_calendar days_of_week run_window start_times start_mins resources profile term_run_time watch_file watch_interval",tags)
        for ( tagNr=1; tagNr<=numTags; tagNr++ ) {
            tag = tags[tagNr]
            printf "\"%s\"%s", tag, (tagNr<numTags ? OFS : ORS)
        }
    }

    !NF || /^\/\*/ { next }
    { gsub(/^[[:space:]]+|[[:space:]]+$/,"") }

    match($0,/[[:space:]]job_type:/) {
        if ( jobNr++ ) {
            prt()
            delete tag2val
        }

        # save "insert_job" value
        tag = substr($1,1,length($1)-1)
        val = substr($0,length($1)+1,RSTART-(length($1)+2))
        gsub(/^[[:space:]]+|[[:space:]]+$/,"",val)
        tag2val[tag] = val

        # update $0 to start with "job_type" to look like all other input
        $0 = substr($0,RSTART+1)
    }

    {
        tag = val = $0
        sub(/:.*/,"",tag)
        sub(/[^:]+:[[:space:]]*/,"",val)
        tag2val[tag] = val
    }

    END { prt() }

    function prt(    tagNr,tag,val) {
        for ( tagNr=1; tagNr<=numTags; tagNr++ ) {
            tag = tags[tagNr]
            val = tag2val[tag]
            printf "\"%s\"%s", val, (tagNr<numTags ? OFS : ORS)
        }
    }
'

$ ./tst.sh
"insert_job","job_type","box_name","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","resources","profile","term_run_time","watch_file","watch_interval"
"Test_A","CMD","","sleep 3000","machine1","user1","0","","","","","","","","","","1","",""
"Test_B","CMD","","echo","machine2","user2","0","","","","","","","","","","","",""
"Test_c","CMD","","sleep 10","machine3","user3","0","","","","","","","","","","","",""
"Test_d","CMD","","ls","machine4","user4","0","","","","","","","","","","","",""

The above is a simplified version of my answer to your previous question where I provided a script that'd output a CSV of all fields instead of the above which only outputs a specific list of fields.

It's not clear why you have a loop calling autorep in your existing script but if you do need one for some reason then chances are you should be calling autorep in a loop and piping the loop output to awk

while IFS= read -r i; do
    autorep -j "$i" -q
done |
awk '...'

instead of calling autorep and piping it's output to awk inside the loop:

while IFS= read -r i; do
    autorep -j "$i" -q |
    awk '...'
done

The above was run on this input copy/pasted from your question:

$ cat file
/* ----------------- Test_A ----------------- */

insert_job: Test_A  job_type: CMD
command: sleep 3000
machine: machine1
owner: user1
permission:
date_conditions: 0
term_run_time: 1
alarm_if_fail: 1
alarm_if_terminated: 1


/* ----------------- Test_B ----------------- */

insert_job: Test_B    job_type: CMD
command: echo
machine: machine2
owner: user2
permission:
date_conditions: 0
description: "Test"
std_out_file: "/tmp/$AUTO_JOB_NAME.$AUTORUN.out"
std_err_file: "/tmp/$AUTO_JOB_NAME.$AUTORUN.err"
max_run_alarm: 1
alarm_if_fail: 0
alarm_if_terminated: 0
send_notification: 1



/* ----------------- Test_c ----------------- */

insert_job: Test_c   job_type: CMD
command: sleep 10
machine: machine3
owner: user3
permission:
date_conditions: 0
alarm_if_fail: 0
alarm_if_terminated: 0


/* ----------------- Test_d ----------------- */

insert_job: Test_d   job_type: CMD
command: ls
machine: machine4
owner: user4
permission:
date_conditions: 0
alarm_if_fail: 1
alarm_if_terminated: 1

Ed morton your script work very well, for the first `split( )` arg if the tags have space in them, then how to separate them. what i mean is if instead of `insert_job` i want to use `insert job` then how to make the script to take it as one string? — NecroCoder, Jan 25 '23 at 18:12
There's a few different things that could mean and a few different ways to implement a solution but in general if you want to use tags on the header line that are different from the tags in your data (e.g. `insert job` in the header but `insert_job` in the data) then you need to provide an algorithm or mapping between the 2 so accept an answer to the question you asked here when you're ready to do so and then ask a follow up question with sample input/output that demonstrates your requirements for that new issue. — Ed Morton, Jan 25 '23 at 18:20

dawg · Answer 2 · 2023-01-25T20:35:49.557

Here is a ruby to do that:

ruby -e '

scan1=/^\/\*[ \ta-zA-Z_-]+\*\/\R+([\s\S]*?)(?=(?:^\/\*[ \t-]+)|\z)/
scan2=/([^ \t\n\r:]+): ([^ \t\n\r:]+)/

headers=  "insert_job   job_type    box_name    command machine owner   
        date_conditions condition   run_calendar    exclude_calendar    
        days_of_week    run_window  start_times start_mins  resources   
        profile term_run_time   watch_file  watch_interval".
        split.map{|e| "\"#{e}\""}

data=$<.read.
    scan(scan1).
    map{|block| block[0].scan(scan2)}

hsh=Hash.new {|h,k| h[k] = [""]*data.length}

data.each_with_index{|r,i|
    r.to_h.each{|k,v| hsh[k][i]=v}
}

puts headers.join(",")
hsh.values.transpose.
   each{|r| puts r.map{|e| e.match(/^".*"$/) ? e : "\"#{e}\""}.join(",")}

' file

Prints:

"insert_job","job_type","box_name","command","machine","owner","date_conditions","condition","run_calendar","exclude_calendar","days_of_week","run_window","start_times","start_mins","resources","profile","term_run_time","watch_file","watch_interval"
"Test_A","CMD","sleep","machine1","user1","0","1","1","1","","","","",""
"Test_B","CMD","echo","machine2","user2","0","","0","0","Test","/tmp/$AUTO_JOB_NAME.$AUTORUN.out","/tmp/$AUTO_JOB_NAME.$AUTORUN.err","1","1"
"Test_c","CMD","sleep","machine3","user3","0","","0","0","","","","",""
"Test_d","CMD","ls","machine4","user4","0","","1","1","","","","",""

And here is a ruby to do the second one:

ruby -e '
scan1=/(\S[\s\S]*?)(?=(?:^$)|\z)/
scan2=/(^[^:]+):[ \t]+(.*)/

data=$<.read.
    scan(scan1).
    map{|block| block[0].scan(scan2)}.
    map{|s| s.to_h}

hsh=Hash.new{|h,k| h[k] = [""]*data.length}
data.each_with_index{|h, i| h.each{|k,v| hsh[k][i]<<v}}

puts hsh.keys.map{|e| "\"#{e}\""}.join(",")
hsh.values.transpose.
    each{|r| puts r.map{|e| e.match(/^".*"$/) ? e : "\"#{e}\""}.join(",")}
' file

Prints:

"Machine Name","Type","Node Name","Agent Name","Operating System","Agent Release","Agent Build"
"machine1machine2","aa","machine1.testmachine2.test","AGENTAGENT","Windows Server 2012 Windows Server 2012 for amd64","12.012.0","6181, Service Pack 00, Maintenance Level 006181, Service Pack 00, Maintenance Level 00"
"machine1machine2","aa","machine1.testmachine2.test","AGENTAGENT","Windows Server 2012 Windows Server 2012 for amd64","12.012.0","6181, Service Pack 00, Maintenance Level 006181, Service Pack 00, Maintenance Level 00"

Unwanted blank row in CSV file

2 Answers2