1

I'm trying to create a Bash script to calculate MD5 checksum of big files using different process. I learned that one should use & for that purpose.

At the same time, I wanted to capture the results of the check sum in different variables and write them in file in order to read them after. So, I wrote the following script "test_base.sh" and executed it using the command "sh ./test_base.sh" and the results were sent to the following file "test.txt" which was empty.

My OS is LUBUNTU 22.04 LTS.

Why the "test.txt" is empty?

Code of the "test_base.sh":

#!/bin/bash


md51=`md5sum -b ./source/test1.mp4|cut -b 1-32` &

md52=`md5sum -b ./source/test2.mp4|cut -b 1-32` &

wait
echo "md51=$md51">./test.txt
echo "md52=$md52">>./test.txt

Result of "test.txt":

md51=
md52=
Cyrus
  • 84,225
  • 14
  • 89
  • 153
TRANSISTOR
  • 13
  • 3
  • 5
    `&` at the end of a command makes it run in the background, so the variables `md51` and `md52` get defined in background subprocesses, not in the main shell process executing the script. [shellcheck.net](https://www.shellcheck.net) would've pointed this problem out; please use it! – Gordon Davisson Nov 12 '22 at 10:43
  • 2
    Why store the values in a variable at all? Use `cat <(md5sum ...) <(md5sum ...)` to execute in parallel while keeping the order of the output. For more files, use a dedicated tool like GNU parallel. – Socowi Nov 12 '22 at 11:50

1 Answers1

0

Updated Answer

If you really, really want to avoid intermediate files, you can use GNU Parallel as suggested by @Socowi in the comments. So, if you run this:

parallel -k md5sum {} ::: test1.mp4 test2.mp4

you will get something like this, where -k keeps the output in order regardless of which one finishes first:

d5494cafb551b56424d83889086bd128  test1.mp4
3955a4ddb985de2c99f3d7f7bc5235f8  test2.mp4

Now, if you transpose the linefeed into a space, like this:

parallel -k md5sum {} ::: test1.mp4 test2.mp4 | tr '\n' ' '

You will get:

d5494cafb551b56424d83889086bd128  test1.mp4 3955a4ddb985de2c99f3d7f7bc5235f8  test2.mp4

You can then read this into bash variables, using _ for the interspersed parts you aren't interested in:

read mp51 _ mp52 _ < <(parallel -k md5sum {} ::: test1.mp4 test2.mp4 | tr '\n' ' ')
echo $mp51, $mp52
d5494cafb551b56424d83889086bd128,3955a4ddb985de2c99f3d7f7bc5235f8

Yes, this will fail if there are spaces or linefeeds in your filenames, but if required, you can make a successively more and more complicated command to deal with cases your question doesn't mention, but then you kind of miss the salient points of what I am suggesting.


Original Answer

bash doesn’t really have the concept of awaiting the result of a promise. So you could go with something like:

md5sum test1.mp4 > md51.txt &
md5sum test2.mp4 > md52.txt &

wait    # for both

md51=$(awk ‘{print $1}’ md51.txt)
md52=$(awk ‘{print $1}’ md52.txt)

rm md5?.txt
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • Thanks for your feedback but i wanted to avoid using intermediate text file from the beginning. But it seems as per your saying about bash, i have no choice other than using text file. – TRANSISTOR Nov 12 '22 at 14:51
  • I have added a method without intermediate files - please have another look. – Mark Setchell Nov 12 '22 at 16:46