5

I am writing a function in which I am replacing the leading/trailing space from the column and if there is no value in the column replace it with null. Function is working fine for one column but how can i modify it for multiple columns.

Function :

#cat trimfunction
#!/bin/bash
function trim
{
vCol=$1                                                    ###input column name
vFile=$2                                                   ###input file name
var3=/home/vipin/temp                                      ###temp file
awk -v col="${vCol}" -f /home/vipin/colf.awk ${vFile} > $var3 ###operation
mv -f $var3 $vFile                                         ###Forcefully mv
}

AWK script :

#cat colf.awk
#!/bin/awk -f
BEGIN{FS=OFS="|"}{
  gsub(/^[ \t]+|[ \t]+$/, "", $col)                        ###replace space from 2nd column
}
{if ($col=="") {print $1,"NULL",$3} else print $0}         ###replace whitespace with NULL

Input file : leading/trailing/white space in 2nd column

#cat filename.txt
1| 2016-01|00000321|12
2|2016-02 |000000432|13
3|2017-03 |000004312|54
4|  |000005|32
5|2017-05|00000543|12

Script :

#cat script.sh

    . /home/vipin/trimfunction
    trim 2 filename.txt

Output file : leading/trailing/white space removed in 2nd column

#./script.sh
#cat filename.txt
1|2016-01|00000321|12
2|2016-02|000000432|13
3|2017-03|000004312|54
4|NULL|000005
5|2017-05|00000543|12

If input file is like below - ( white/leading/trailing space in 2nd and 5th column of file)

1|2016-01|00000321|12|2016-01 |00000
2|2016-02 |000000432|13| 2016-01|00000
3| 2017-03|000004312|54|  |00000
4|  |000005|2016-02|0000
5|2017-05 |00000543|12|2016-02 |0000

How to achive below output - (All leading/trailing space trimmed and white space replaced with NULL in 2nd and 5th col) something like trim 2 5 filename.txt trim 2 5 filename.txt ###passing two column name as input

1|2016-01|00000321|12|2016-01|00000
2|2016-02|000000432|13|2016-01|00000
3|2017-03|000004312|54|NULL|00000
4|NULL|000005|2016-02|0000
5|2017-05|00000543|12|2016-02|0000
VIPIN KUMAR
  • 3,019
  • 1
  • 23
  • 34
  • Do you REALLY want to do it for specific columns or just every column that's empty? Include in your example an empty column that you do NOT want to be modified if such can happen. Very well specified question btw! – Ed Morton Nov 11 '16 at 12:25
  • @EdMorton - Only for the specific column which i will pass like this : trim 2 5 filename.txt OR trim 4 6 119 filename.txt – VIPIN KUMAR Nov 11 '16 at 12:51

3 Answers3

4

This will do what you said you wanted:

$ cat tst.sh
file="${!#}"
cols=( "$@" )
unset cols[$(( $# - 1 ))]

awk -v cols="${cols[*]}" '
BEGIN {
    split(cols,c)
    FS=OFS="|"
}
{
    for (i in c) {
        gsub(/^[[:space:]]+|[[:space:]]+$/,"",$(c[i]))
        sub(/^$/,"NULL",$(c[i]))
    }
    print
}' "$file"

$ ./tst.sh 2 5 file
1|2016-01|00000321|12|2016-01|00000
2|2016-02|000000432|13|2016-01|00000
3|2017-03|000004312|54|NULL|00000
4|NULL|000005|2016-02|0000
5|2017-05|00000543|12|2016-02|0000

but if what you REALLY wanted was to operate on ALL fields instead of specific ones then of course there's a simpler solution.

Never do cmd file > tmp; mv tmp file by the way, always do cmd file > tmp && mv tmp file instead (note the &&) so you only overwrite your original file if the command succeeded. Also - always quote your shell variables unless you have a very specific purpose in mind by not doing so and fully understand all of the implications, so use "$file", not $file. Google it.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
2

You can pass a list of columns to modify as a parameter. Create files

$ cat trim.awk

BEGIN {
    split(c, a)
    FS = OFS = "|"
}

{
    for (i in a) {
        i = a[i]
        gsub(/^[ \t]+|[ \t]+$/, "", $i)
        if (!length($i)) $i = "NULL"
    }
    print
}

and

$ cat filename.txt

1|2016-01|00000321|12|2016-01 |00000
2|2016-02 |000000432|13| 2016-01|00000
3| 2017-03|000004312|54|  |00000
4|  |000005|2016-02|0000
5|2017-05 |00000543|12|2016-02 |0000

Usage:

awk -v c="2 5" -f trim.awk filename.txt 
slitvinov
  • 5,693
  • 20
  • 31
1

If managing leading/trailing spaces is all you want to do, you probably don't want to do all(AWK code) that.

cat q1.txt | tr -s ' ' | sed 's/|\ |/|NULL|/g' | sed 's/\ //g' should do.

Break-down
tr -s ' ' : Squeeze multiple spaces into one
sed 's/|\ |/|NULL|/g' : Replace all "| |" with "|NULL|"
sed 's/\ //g' : Replace all spaces with empty string.

alok
  • 502
  • 3
  • 15
  • Of course you'd want to do all awk code for that. What else would you do? Your command line has a UUOC, is escaping blank chars for no apparent reason, would fail when the spaces/empty fields are the first/last on a line, redundantly compresses chains of spaces to a single space and THEN removes that space, would undesirably remove all spaces in the middle of fields, would fail to replace empty fields (`||`) with NULL as desired, and doesn't focus on specific fields as the OP asked for. – Ed Morton Nov 11 '16 at 13:59
  • Agree with redundancy. OP has used cat, hence I used cat, started answer starts with a disclaimer that if white-space management is all that was required(achieving given output) only then the answer fits. And I tried with 7 lines, with 2 added ones SPECIFICALLY for the first and last spaces in the line, before posting the command here. – alok Nov 11 '16 at 14:06
  • It is not possible for that command to replace `||` (3 empty fields or any of them individually) with `NULL|NULL|NULL` as would be desired. Try it again. – Ed Morton Nov 11 '16 at 14:17