AWK replace null column with the previous row column value

Question

I have a series of field-delimited files (symbol | ) with varying column widths:

File 1:

2p              | 2P*    |  1/2 |    2 |                 10.19880615024                       |
2s              | 2S     |  1/2 |    2 |                 10.19881052514816                    |
2               |        |      |    8 |                 10.1988358                           |
2p              | 2P*    |  3/2 |    4 |                 10.19885151459                       |
3p              | 2P*    |  1/2 |    2 |                 12.0874936591                        |
3s              | 2S     |  1/2 |    2 |                 12.0874949611                        |
3               |        |      |   18 |                 12.0875052                           |
3d              | 2D     |  3/2 |    4 |                 12.0875070783                        |

File 2:

2s2.2p3            | 4S*    |  3/2 |    4 |             0.0000000               |
2s2.2p3            | 2D*    |  5/2 |    6 |             2.3835298               |
2s2.2p3            |        |  3/2 |    4 |             2.3846100               |
2s2.2p3            | 2P*    |  1/2 |    2 |             3.5755704               |
2s2.2p3            |        |  3/2 |    4 |             3.5756182               |
2s2.2p2.(3P).3s    | 4P     |  1/2 |    2 |            10.3259087               |
2s2.2p2.(3P).3s    |        |  3/2 |    4 |            10.3300944               |
2s2.2p2.(3P).3s    |        |  5/2 |    6 |            10.3358956               |
2s2.2p2.(3P).3s    | 2P     |  1/2 |    2 |            10.6796703               |
2s2.2p2.(3P).3s    |        |  3/2 |    4 |            10.6899808               |

and I would like to fill the blanks in the second column with the corresponding values in the last non-NULL column above: Example for file 2:

2s2.2p3            | 4S*    |  3/2 |    4 |             0.0000000               |
2s2.2p3            | 2D*    |  5/2 |    6 |             2.3835298               |
2s2.2p3            | 2D*    |  3/2 |    4 |             2.3846100               |
2s2.2p3            | 2P*    |  1/2 |    2 |             3.5755704               |
2s2.2p3            | 2P*    |  3/2 |    4 |             3.5756182               |
2s2.2p2.(3P).3s    | 4P     |  1/2 |    2 |            10.3259087               |
2s2.2p2.(3P).3s    | 4P     |  3/2 |    4 |            10.3300944               |
2s2.2p2.(3P).3s    | 4P     |  5/2 |    6 |            10.3358956               |
2s2.2p2.(3P).3s    | 2P     |  1/2 |    2 |            10.6796703               |
2s2.2p2.(3P).3s    | 2P     |  3/2 |    4 |            10.6899808               |

This question is similar to the one in here where I can do this in the first column using awk through the command:

awk -F"|" '/^ /{$0=(x)substr($0,length(x)+1)}{x=$1}1' "$file"

however I am not knowleadgeable enough to make it work in the second column (or any arbitrary colum for that matter).

Can someone help? Thanks in advance!

anubhava · Accepted Answer · 2021-01-27T03:38:51.943

3

Using your file2 as an example, you may use this awk command:

awk -v c=2 'BEGIN {FS=OFS="|"} $c ~ /^[ \t]*$/ {$c = p} {p = $c} 1' file

2s2.2p3            | 4S*    |  3/2 |    4 |             0.0000000               |
2s2.2p3            | 2D*    |  5/2 |    6 |             2.3835298               |
2s2.2p3            | 2D*    |  3/2 |    4 |             2.3846100               |
2s2.2p3            | 2P*    |  1/2 |    2 |             3.5755704               |
2s2.2p3            | 2P*    |  3/2 |    4 |             3.5756182               |
2s2.2p2.(3P).3s    | 4P     |  1/2 |    2 |            10.3259087               |
2s2.2p2.(3P).3s    | 4P     |  3/2 |    4 |            10.3300944               |
2s2.2p2.(3P).3s    | 4P     |  5/2 |    6 |            10.3358956               |
2s2.2p2.(3P).3s    | 2P     |  1/2 |    2 |            10.6796703               |
2s2.2p2.(3P).3s    | 2P     |  3/2 |    4 |            10.6899808               |

To populate 3rd column just pass c=3 in place of c=2.

edited Jan 27 '21 at 03:38

answered Jan 26 '21 at 17:28

anubhava

761,203
64
569
643

Thanks that worked for column 2, I'm testing with column 3 to check how I can generalize your solution – doktorcaligari Jan 26 '21 at 19:02
1

Check my more generic solution now. You can pass `c=3` to populate 3rd column instead of 2nd. – anubhava Jan 26 '21 at 19:41
1

That works too, simple and elegant, thanks – doktorcaligari Jan 26 '21 at 21:27
1

^1 That'll behave the same in any awk, there's nothing gawk-specific in it. – Ed Morton Jan 26 '21 at 22:22
Oh yes you are right, three is nothing specific to gnu aqk – anubhava Jan 27 '21 at 03:39
Thats a wonderful solution.. Is there a way I can apply this logic to multiple columns ? Like say for example columns 1 till 5 ? – Chetan SP Jul 18 '22 at 07:49
@ChetanSP: You will need to run a loop from 1 to 5 and store each value in an array indexed by column number. – anubhava Jul 18 '22 at 07:55
@anubhava Thank you for the idea. Out of the total 18 columns in my file, the below snippet replicates data for the first 5 only. awk ' BEGIN {FS = OFS = "|"} { for (i=1; i<=NF-13; i++) if ($i ~ /[^[:blank:]]/) f[i]=$i else $i = f[i] print } ' ' sourceFile.csv >> ProcessedFile.csv – Chetan SP Jul 22 '22 at 09:38

steffen · Answer 2 · 2021-01-28T08:51:54.797

0

Answer to a previous version saying the file was tab separated...

If that column doesn't have values like 0P or 0D*, this might be helpful for you:

awk 'BEGIN{FS=OFS="\t"}!$3{$3=e}{print;e=$3}' file

Explanation:

Set field separators to tab only
If $3 is empty, set $3 to variable e
print line
set variable e to $3 for next record

edited Jan 28 '21 at 08:51

answered Jan 26 '21 at 17:46

steffen

16,138
4
42
81

this runs, however it doesn't seem to be changing anything – doktorcaligari Jan 26 '21 at 19:14
@doktorcaligari Are you sure your input file is *tab* separated as you wrote in your question? I get the exact output you wrote. I had to convert your source file however, because you posted spaces instead of tabs. – steffen Jan 26 '21 at 20:13
Oh I'm sorry, you're right, I said tab delimited, that was a typo I meant "field delimited" (with |), my bad I will correct the question for future reference – doktorcaligari Jan 26 '21 at 21:24
That would fail if $3 evaluated numerically to 0 – Ed Morton Jan 26 '21 at 22:23

score 0 · Answer 3 · answered Jan 26 '21 at 20:18

This will take care of all fields iin the file

awk '
    BEGIN {FS = OFS = "|"}
    {
        for (i=1; i<=NF; i++)
            if ($i ~ /[^[:blank:]]/) f[i]=$i   # remember the value of this field
            else $i = f[i]                     # use the previous field value
        print
    }
' file ...

This of course can't do anything about empty fields in the first line of the file.

If you use GNU awk, you can write the results back to the file:

gawk -i inplace '...' file ...

AWK replace null column with the previous row column value

3 Answers3