1

Question:

I want to untar a tarfile which has many tar files within itself and remove the files in all the tar files and I want all of these processes to run in parallel in Unix bash scripting.

Conditions:

  1. The script should return an error if any untar/remove process has any error.
  2. It should only return success after all N (untar and remove) processes complete successfully.

Proposed solution:

 mkdir a
 tar -C a -xvf b.tar
 cd a
 for i in *
 do
 rm -r $i &
 done
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
beck03076
  • 3,268
  • 2
  • 27
  • 37
  • Is my solution right? and right now, im not getting the exit status of the background processes. – beck03076 May 03 '12 at 21:00
  • I want to implement this, "As you launch each background process, save $! which is the pid of the background process. After you launch all process you will have all the pid's. Now one by one wait for each pid, with "wait $pid". ."... how to implement that? – beck03076 May 03 '12 at 21:02
  • Please do not mangle a question beyond recognition after you get an answer. – Jonathan Leffler May 04 '12 at 02:10
  • What is the purpose of the exercise? Overall, it is pointless except as a homework style question, because you want to remove everything you extract, it seems. So, why bother with the extraction? – Jonathan Leffler May 04 '12 at 02:16
  • It was asked in an interview!! – beck03076 May 04 '12 at 09:03

3 Answers3

3

If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:

tar xvf foo.tgz | perl -ne 'print $l;$l=$_;END{print $l}' | parallel rm

It is useful if you do not have space to extract the full tar.gz file, but you need to process files as you unpack them:

tar xvf foo.tgz | perl -ne 'print $l;$l=$_;END{print $l}' | parallel do_stuff {}\; rm {}

You can install GNU Parallel simply by:

wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem

Watch the intro videos for GNU Parallel to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
1
mkdir a
tar -C a -xvf b.tar
 cd a
 success=$(for i in *
 do
 rm -r $i || echo failed & # if a job fails false will be echoed
 done
 wait)
 # if any of the jobs failed, success will be set to a value other than ""
 [[ -z "$success" ]] && exit 0 || exit 1
Burton Samograd
  • 3,652
  • 19
  • 21
  • Thanks Burton!, what sense does this make?. Am i understanding this the right way?. Instead of creating one process "rm *", im creating n number of processes in parallel and removing files to save time. is that right? – beck03076 May 03 '12 at 21:12
  • Actually, now that I think about it this won't work. The variable will be set in a subshell. I'll do an edit that does work. – Burton Samograd May 03 '12 at 21:21
  • 1
    Yes, using & casuses the process to run in the background, so you are removing every file in the directory a in parallel. This is not a good way to do it though; I suggest looking at xargs with the -P option: cd a && { ls * | xargs -P 4 rm -r; } which will run 4 task in parallel. – Burton Samograd May 03 '12 at 21:27
  • Forgive my torturing, can you alter that code to implement xargs, actually that was my question?. I proposed a solution and I wanted people at stackoverflow to validate that. Unfortunately, you had to downvote my question. Its alright!. IF you can xargs that code, go ahead. Thanks. – beck03076 May 03 '12 at 21:32
  • I gave the code in my last comment. Just replace the for loop in your original code with the code from my last comment. I would think that the return code of xargs is failure if any of the commands fail so that should give you that information directly. – Burton Samograd May 03 '12 at 21:53
1

The answer tar xvf a.tar | tac | xargs -P 4 rm -rv is inspired from Burton Samograd's comment about xargs -P

$ mkdir -p a/b/c/d
mkdir: created directory `a'
mkdir: created directory `a/b'
mkdir: created directory `a/b/c'
mkdir: created directory `a/b/c/d'

$ touch a/1 a/2 a/3 a/b/4 a/b/5

$ tar cf a.tar a

$ rm -rfv a
removed directory: `a/b/c/d'
removed directory: `a/b/c'
removed `a/b/4'
removed `a/b/5'
removed directory: `a/b'
removed `a/3'
removed `a/1'
removed `a/2'
removed directory: `a'

$ tar xvf a.tar | tac | xargs -P 4 rm -rv
removed `a/2'
removed `a/1'
removed `a/3'
removed `a/b/5'
removed `a/b/4'
removed directory: `a/b/c/d'
removed directory: `a/b/c'
removed directory: `a/b'
removed directory: `a'
Community
  • 1
  • 1
oHo
  • 51,447
  • 27
  • 165
  • 200