0

I want to use awk to edit a column of a large file inplace. If, for any reason, the process break/stops, I don't want to lose the work already done. I've tried to add fflush but seems it does not wort with inplace.

In order to simulate the desired result, here is a test file with 3 columns. The last column is all zeros.

paste -d '\t' <(seq 1 10) <(seq 11 20) | 
    awk 'BEGIN {FS="\t"; OFS=FS} {$(NF+1)=0; print}' > testfile

Then I want to replace the values in last column. In this simple example, I'm just replacing them by the sum of the first and second columns. I'm adding a system sleep so it might be possible to abort the script in the middle in see the result.

awk -i inplace 'BEGIN {FS="\t"; OFS=FS} $3==0{$3=$1+$2; print; fflush(); system("sleep 1")}' testfile

If you run the script and abort it (ctrl+z) before it ends, the test file is unchanged.

Is it possible to achieve the desired result (get the partial result when the script breaks or stops)? How should I do it?

LEo
  • 477
  • 5
  • 14
  • try it without `-i inplace` and save output by redirecting to a new file, i.e. `awk 'code' file > newFile`. (you can always rename your orig file, but then you loose your process tracability). – shellter Sep 12 '19 at 17:20

1 Answers1

0

"In-place" editing is not really. A temporary file holds the output, and replaces the input at the end of the script.

Actual in-place editing would be slow: unless the output is the same length as the input, the file size needs to change, and awk would have to re-write the entire file (everything after the current line, at least) on every buffer flush. Note this caveat from the documentation:

If the program dies prematurely … a temporary file may be left behind.

You could script up some recovery code to merge that temporary file with your input after an abort.

Or, you could adjust your script to only modify one line per run (and simply print every subsequent line, unmodified), and re-run it until there are no changes left to make. This would force awk to re-write the file on every change. It will be slow, but there just isn't any fast way to remove data from the middle of a file.

Oh My Goodness
  • 216
  • 2
  • 9