compare files in shell script with md5sum and create csv for the changed file

Question

I am very much new to shell script and found a way about how to compare files using shell script while using md5sum.

I want to compare Options_old and Options_new files in shell script and identify the new Ticker field value added in the new file. For this new ticker field value I want to create CSV file.

For example if we compare Options_old and Options_new files and check the Options_new file there is new ticker field value 510051 2 C2.50 and 510052 2 P2.50 added and I want to create and print this value in CSV file.

Options_new.out.gz file

START-OF-FILE
PROGRAMNAME=getdata
DATEFORMAT=yyyymmdd
START-OF-FIELDS
TICKER
EXCH_CODE
END-OF-FIELDS
TIMESTARTED=Wed Feb 12 19:30:38 JST 2020
START-OF-DATA
510051 CH 02/26/20 C2.5 Equity|0|75|510051 2 C2.50|CH
510052 CH 02/26/20 P2.5 Equity|0|75|510052 2 P2.50|CH
510050 CH 02/26/20 C2.55 Equity|0|75|510050 2 C2.55|CH
510050 CH 02/26/20 P2.55 Equity|0|75|510050 2 P2.55|CH
END-OF-DATA
DATARECORDS=1140
TIMEFINISHED=Wed Feb 12 19:32:50 JST 2020
END-OF-FILE

Options_old.out.gz file

START-OF-FILE
PROGRAMNAME=getdata
DATEFORMAT=yyyymmdd
START-OF-FIELDS
TICKER
EXCH_CODE
END-OF-FIELDS
TIMESTARTED=Wed Feb 12 19:30:38 JST 2020
START-OF-DATA
510050 CH 02/26/20 C2.5 Equity|0|75|510050 2 C2.50|CH
510050 CH 02/26/20 P2.5 Equity|0|75|510050 2 P2.50|CH
510050 CH 02/26/20 C2.55 Equity|0|75|510050 2 C2.55|CH
510050 CH 02/26/20 P2.55 Equity|0|75|510050 2 P2.55|CH
END-OF-DATA
DATARECORDS=1140
TIMEFINISHED=Wed Feb 12 19:32:50 JST 2020
END-OF-FILE

I have started the code but not understood further how to compare the particular field and then generate csv file:

#!/bin/sh

OLD_PATH="/opt/old"
NEW_PATH="/opt/new"

FILES="${FILES} Options_new.out.gz Options_old.out.gz"

for FILE in `echo ${FILES}`
do
   MD5SUM_NEW=`md5sum ${OLD_PATH}/${FILE} | awk '{print $1}'`
   MD5SUM_OLD=`md5sum ${NEW_PATH}/${FILE} | awk '{print $1}'`

   if [ "${MD5SUM_NEW}" != "${MD5SUM_OLD}" ]; then
      echo "Found new Version of ${FILE}"
#currently i am comparing the data from the whole file but i want to compare the data only for the Ticker value in the both files

#here create new csv file with the new ticker value found in Options_new.out.gz file

   fi

exit ${EXIT}

Why do you want to use md5sum ? your code tries to compare "/opt/old/Options_new.out.gz" with "/opt/new/Options_new.out.gz" and "/opt/old/Options_old.out.gz" with "/opt/new/Options_old.out.gz" but from the text it seems you want to compare Options_new.out.gz with Options_old.out.gz. Which is it ? What is the "key" that is used for the comparation ? — Sorin, Feb 12 '20 at 21:02
md5 is a cryptographic hash. Among other things, that means that different files should have different hashes, but that comparing the hashes should give you *no information at all* about what the specific differences are. That makes it completely unsuitable for what you're trying to do. I'm a bit confused about your actual goal, but you'll need very different tools for it. — Gordon Davisson, Feb 12 '20 at 21:53
sorry for the confusion ..i want to compare /opt/new/Options_new.out.gz with /opt/old/Options_old.out.gz and compare the field Ticker such that if there is new Ticker field value i found in Option_new.out.gz then i want to create csv with this field values....its not necessary that i should use md5sum...i just found that with md5sum we can achive the comparision .....the code is just base which might be incorrect as i am not sure how to do it :( — Andrew, Feb 13 '20 at 07:54
@Andrew, you still didn't answer my last question - what is the key used to compare the tickers. You say "510051 2 C2.50" is a new value, is that value new for "2 C2.50" or for the entire file ? (side note: if you want me to get notified by your answer you need to metion me like this @Sorin) — Sorin, Feb 14 '20 at 12:32

score 1 · Accepted Answer · answered Feb 20 '20 at 22:09

Food for thought maybe runs to check if different, if so prints lines that have with the bits you indicated you wished to save to csv

#!/bin/bash

#Check if file are different then grep for word differ 
#normally would spit out Files file2 and file1 differ
# flags are -F fixed string, -w match only full words
# -q quiet ie no output to stdout (screen)

if $(diff -q "$2" "$1" | grep -Fwq "differ")
then
    #create a var of the changed text, awk looking at 
    #begining of line to see if begins with > and then
    #output the full fine for awk to then select the 
    #vars you want
    changeSyn=$(diff file2 file1 | awk '$1 ~ /^ *>/' | awk '{print $2","$5","$7 }')
    #same again only for new vars
    addedSyn=$(diff file2 file1 | awk '$1 ~ /^ *</' | awk '{print $2","$5","$7 }')
    echo "$changeSyn"
    echo "$addedSyn"
else
    echo "No change"
fi

score -1 · Answer 2 · answered Feb 13 '20 at 00:53

-1

Try using meld for visual

 meld file1 file2

or

diff for command line

 diff file1 file2
 10,11c10,11
 < 510051 CH 02/26/20 C2.5 Equity|0|75|510051 2 C2.50|CH
 < 510052 CH 02/26/20 P2.5 Equity|0|75|510052 2 P2.50|CH
 ---
 > 510050 CH 02/26/20 C2.5 Equity|0|75|510050 2 C2.50|CH
 > 510050 CH 02/26/20 P2.5 Equity|0|75|510050 2 P2.50|CH

answered Feb 13 '20 at 00:53

BobMonk

178
1
10

i want to try in command line ...what does 10,11c10,11 menas ? – Andrew Feb 13 '20 at 07:48
that is the output from the diff are line numbers that have changed sorry for crap speed of responce (hospital jobby) – BobMonk Feb 20 '20 at 21:25
so line 10 and 11 of file1 have changed in comparison to 10 and 11 of file2 – BobMonk Feb 20 '20 at 21:33

compare files in shell script with md5sum and create csv for the changed file

2 Answers2