3

In order to save space on my back-up disk, I want to "mothball" the data files that can be easily regenerated and thus don't need to be backed-up.

Currently, I'm using UNIX's "parallel" command to essentially split a large nested for-loop over many cores, with each process working on different input arguments.

# PARALLEL COMMAND CALLING mothballer.sh WITH INPUT ARGUMENTS
time parallel -j +0 --max-procs 8 "./mothballer.sh {1} {2} {3} {4} {5}" ::: {date1,date2} ::: {exp1,exp2} ::: {2,4,8} ::: {16,32,64} ::: {1,2,3,4,5}

...which interprets the command line arguments and passes them to following script, "motherballer.sh":

# reading command line arguments
date=$1
experiment=$2
parameter1=$3
parameter2=$4
trial=$5

# paths to original directory and a mirror directory in the backup server
WORK_DIR=/$WORK_MACHINE/${date}/${experiment}/${parameter1}/${parameter2}/${trial}/results
BACKUP_DIR=/$BACKUP_SERVER/${date}/${experiment}/${parameter1}/${parameter2}/${trial}/results

# create the mirror directory in the backup server
mkdir -p $BACKUP_DIR

# do the backup ("rsync" is similar to "cp")
rsync -avP $WORK_DIR/*.csv $BACKUP_DIR
# run rsync again to verify it worked; "rm" old files.

Is there a better way to this? For example, using "find"?


EDIT: Also, it would be nice to be able to use the '*' wildcard, because not all experiments have the same parameters combinations, etc. (i.e. the directories are equally deep but have different folder names). This is the biggest limitation with my current method (above).

yunque
  • 625
  • 1
  • 8
  • 18
  • *"Better"* in what way? Faster? More selective? Smaller? – Mark Setchell Sep 03 '15 at 10:59
  • Maybe this way is still ok, but I'm mostly wondering if I can do the same thing with "find". It seems more appropriate for searching through directories. – yunque Sep 03 '15 at 11:04
  • @MarkSetchell actually, "more selective" is what I'm after... see my EDIT at the bottom of the OP. – yunque Sep 03 '15 at 15:27

1 Answers1

2

If the command line is not too long:

time parallel ./mothballer.sh ::: */*/*/*/*

In mothballer '${date}/${experiment}/${parameter1}/${parameter2}/${trial}' will be merged to $1.

If the depth is different (zsh or newer bash):

shopt -s globstar
time parallel ./mothballer.sh ::: **/results

In mothballer '${date}/${experiment}/${parameter1}/${parameter2}/${trial}/results' will be merged to $1.

Ole Tange
  • 31,768
  • 5
  • 86
  • 104