compare rows in two files in unix shell script and merge without redundant data

Question

There is one old report file residing on a drive. Everytime a new report is generated, it should be compared to the contents of this old file. If any new account row is reported in this new report file, it should be added to the old file, else just skip. Both files will have same title and headers. Eg: old report

RUN DATE:xyz                FEE ASSESSMENT REPORT

fee calculator

ACCOUNT NUMBER      DELVRY DT     TOTAL FEES     
=======================================================

123456      2014-06-27      110.0

The new report might be

RUN DATE:xyz                FEE ASSESSMENT REPORT

fee calculator

ACCOUNT NUMBER      DELVRY DT     TOTAL FEES     
=======================================================

898989      2014-06-26      11.0

So now the old report should be merged to have both rows under it - 123456 and 898989 acc no rows.

I am new to shell scripting. I don't know if I should use diff cmd or while read LINE or awk?

Thanks!

rather than mess with comparing or skipping the header, I would trim that out using `head +n 8 oldReport > tempReport; head+n 8 newReport >>tempReport ; cat headerFile tempReport > archiveReport`. If you want some sort of sorting, then you apply that to `tempReport` before 'merging' it into archiveReport. Good luck. — shellter, Jun 01 '15 at 16:55
The diff will also see a difference in the rundate, so the head must be skipped. Is the head a fixed number of lines or should the script look for the line with `=========` ? What do you want when an account number occurs in both files with different date or fees? — Walter A, Jun 01 '15 at 19:58
Walter if an acc with diff date/fees shows, i want it in the output. Only completely same rows should not be repeated. — Rimjhim, Jun 02 '15 at 15:19
yes, Didnt realize run date too will differ! thanks. header is of same size...fixed top 10 lines till we find an account no row. — Rimjhim, Jun 02 '15 at 15:20

Cometsong · Answer 1 · 2015-06-03T15:07:43.317

This appears to be several commands in combination to create an actual script, rather than an adept commandlinefu in only one line.

Assuming the number of lines in the header section of the report is consistent, then you can use tail -n +7 to return the lines after the first 7 as you show in your example.
If they are not the same, but all end with the line you've shown above "==========" then you can use grep -n to find that line number and start parsing the account numbers after it.

#!/usr/bin/env bash
OLD_FILE="ancient_report.log"
NEW_FILE="latest_and_greatest.log"
tmp_ext=".tmp"
tail -n +7 ${OLD_FILE} > ${OLD_FILE}${tmp_ext}
tail -n +7 ${NEW_FILE} >> ${OLD_FILE}${tmp_ext}
sort -u ${OLD_FILE}${tmp_ext} > ${OLD_FILE}${tmp_ext}.unique
mv -f ${OLD_FILE}${tmp_ext}.unique ${OLD_FILE}

To illustrate this script:

#!/usr/bin/env bash

The shebang line above tells *nix how to run it.

OLD_FILE="ancient_report.log"
NEW_FILE="latest_and_greatest.log"
tmp_ext=".tmp"

Declare starting variables. You can also do this by using arguments of the file names. OLD_FILE=${1} to get the first argument on the command line.

tail -n +7 ${OLD_FILE} > ${OLD_FILE}${tmp_ext}
tail -n +7 ${NEW_FILE} >> ${OLD_FILE}${tmp_ext}

Put the endings of the two files into a single 'tmp' file

sort -u ${OLD_FILE}${tmp_ext} > ${OLD_FILE}${tmp_ext}.unique

sort and retain only the 'unique' entries with -u If your OS version of sort does not have the -u then you can get the same results by using: sort <filename> | uniq

mv -f ${OLD_FILE}${tmp_ext}.unique ${OLD_FILE}

Replace old file with new uniq'd file.

There are of course many simpler ways to do this, but this one gets the job done with several commands in a sequence.

Edit:
To preserve the header portion of the file with the latest report date, then instead of mving the new tmp file over the old, do:

rm ${OLD_FILE};
head -n 7 ${NEW_FILE}) > ${OLD_FILE}
cat ${OLD_FILE}${tmp_ext}.unique >> ${OLD_FILE}

This removes the OLD_FILE (can't overwrite without deleting first) and cats together the header of the new file (for date) and the entire contents of the unique tmp file. After this you can do general file cleanup such as removing any new files you've created. To preserve/debug any changes, you can add a datestamp to each 'uniqued' file name and keep them as an audit trail of all report additions.

@Cometsong...thanks a lot! thats so helpful. especially the way u described every step! really appreciate it! — Rimjhim, Jun 03 '15 at 14:49
Only issue that I still have is the header portion. I need to extract the header and store it and then prefix it to the sorted unique solution. Should I do this? head -8 ${NEW_FILE} > header.tmp and instead of your last command, do this: cat header.tmp ${OLD_FILE}${tmp_ext}.unique > ${OLD_FILE} — Rimjhim, Jun 03 '15 at 14:55

compare rows in two files in unix shell script and merge without redundant data

1 Answers1