2

So I need to build a shell script (a skill I am rubbish at, I think too linearly and make everything a pipe) that will connect to a remote machine to a specific directory slurp up all the files older than 5 minutes, extract information from the file's name (encoding details below) and scatter the files into relevant directories based off of that, or create the directories if they don't exist on the local backup host.

On a dozen machines I have directory (let us call it /Prod/Data/) full of files thousands of files named data-HOST-v.7.mmddyy.csv

example: date-web2-v.7.052509.csv

Files older than 5 minutes need to be pulled from the remote machines to a local folder /backup/archive/host/year/month/day/csvs

example /backup/archive/web2/2009/05/29/csvs

I'm sure I can do something like ls -1 | cut -d"." -f3 to extract the date section of the file, then use sed or awk to isolate each section and produce the date variables to pick what directories to dump the files in, do something similar to grab the host, but I am not sure how to go about making that correlate with a file on which to execute a move on. Not sure how to execute that remotely, perhaps it is better to scp all the files over from the remote machine first (short of any file younger than 5 minutes, perhaps a find -mmin +5 statement can be used to suss that out?) then do the sorting when everything is on the backup machine.

Would someone be so kind as to point me in the direction of an example script that may provide similar functionality? Everything I write tends to be command | command | command | etc... and I imagine this task will require some dimensionality.

Thank you for your time.

Nathan Milford
  • 792
  • 2
  • 10
  • 21
  • I ought to mention, my main brain-block is how to go about getting a list of the files (older than 5 minutes) remotely then iterating through them and applying the logic to move them without creating a tempfile... – Nathan Milford May 26 '09 at 20:30

3 Answers3

2

Pure Bash Solution, using parameter expansion. See this for an explanation of PE.

foo='date-web2-v.7.052509.csv'
file=${foo%*.csv}
date=${file##*.}

month=${date:0:2}
day=${date:2:2}
year=${date:4:2}

I would probably use Perl for this and use parenthesis to capture what I want in groups from a regular expression.

Kyle Brandt
  • 83,619
  • 74
  • 305
  • 448
  • I won't write a new answer as this one has mostly what I would have wanted to say: 1) In a bash script - use bash if you can, no point forking out awk/sed/anything processes when there's no point; 2) If it does get more complicated, then use something more sophisticated like perl/python. – khosrow May 27 '09 at 10:05
0

The find command has options to to select files based on their age. See the -amin, -atime, -cmin, -ctime, -mmin, and -mtime options.

You could possibly use find to build a list of files you needed moved, store that to a file and then use that in an rsync command with the --include-from= and --remove-source-files options.

Zoredache
  • 130,897
  • 41
  • 276
  • 420
  • This looks like a winner: http://www.go2linux.org/find-copy-files-remote-server-ssh-scp-how-to-script-bash I suppose I'll have to use a tmp file. I just need add sed/awk statements to grab the host and individual date portions from the file name. – Nathan Milford May 26 '09 at 21:02
0

For future reference, this is the script I came up with:

#!/bin/bash
if [ $# != 1 ]
   then
      echo "usage:  slurp_vote_files.sh [user@server]"
      exit 1
fi
ssh $1 "find /Prod/Data/Votes/ -mmin +5 -type f" | while read line; do 
   vote_host=`echo $line | cut -d"_" -f3`
   vote_year=`echo $line | cut -d"." -f3 | sed 's/^..../20/'`
   vote_month=`echo $line | cut -d"." -f3 | sed 's/.\{4\}$//'`
   mkdir -p /bkup/archive/finalized/$vote_host/$vote_year/$vote_month/votes/
   scp -q $1:$line /bkup/archive/finalized/$vote_host/$vote_year/$vote_month/votes/
   ssh -n $1 "rm -f $line";
done
exit 0

It may not line up with the goal/specs in the original post, but it works in my specific case.

Nathan Milford
  • 792
  • 2
  • 10
  • 21