0

here is a script I have wrote that I need help with. in the script I do a find for any file that has not been access for over 30 days, 60, 90, 180, 270 & 365 days.

This works just fine. however, this takes a few days just to finish the 30 day portion. it is scanning a NAS. (millions and millions of files) as you see, the 30 day information really holds all the data need for the rest of the scripts. the 60, 90, etc. portion of the script are just redoing the same effort as the 30 day portion, except for an extended time frame. it would save in this case weeks worth of re-scanning if some how the 60, 90 180, etc.. portions could just get its data from the 30 day output.

this is where I am asking for help. the output is just like an ls -l command. and you can also see from the output below, there are multiple years in this output. the script is attached and printed below.

total 24
-rw-r--r-- 1 root bin 60 Apr 12 13:07 config_file
-rw-r--r-- 1 root bin 9 Apr 12 13:07 config_file.InProgress
-rw-r--r-- 1 root bin 0 Apr 12 13:07 config_file.sids
-rw-r--r-- 1 root bin 1284 Apr 19 10:41 rpt_file
-rw-r--r-- 1 16074 5003 20083 Apr 26 2002 /nas/quota/slot_2/CR_APP002/eb_ora_bin1/sun8/product/9.2s/oem_webstage/oracle/sysman/qtour/console/dat1_01.gif
-rw-r--r-- 1 16074 5003 20088 Apr 26 2002 /nas/quota/slot_2/CR_APP002/eb_ora_bin1/sun8/product/9.2s/oem_webstage/oracle/sysman/qtour/console/set1_04.gif
-rw-r--r-- 1 16074 5003 2008 Apr 26 2002 /nas/quota/slot_2/CR_APP002/eb_ora_bin1/sun8/product/9.2s/oem_webstage/oracle/sysman/qtour/oapps/get2_03.htm
-rw-r--r-- 1 16074 5003 20083 Apr 26 2002 /nas/quota/slot_2/CR_APP002/eb_ora_bin1/sun8/product/9.2s/oem_webstage/oracle/sysman/qtour/oapps/per1_01.gif

any help is appreciated. these are linux distro boxes, so I am sure perl is on there too if needed..

#!/bin/ksh
############################################
# search shares for files                  #
# that have not been accessed              #
# for a certain time.                      #
# NOTE:                                    #
#    $IN = input search                    #
#    $OUT = output directory for text file #
##########################################################
# TESTS                                                  #
#     Numeric arguments can be specified as              #
#                                                        #
#     +n     for greater than n,                         #
#     -n     for less than n,                            #
#     n      for exactly n.                              #
#                                                        #
#     -atime n                                           #
#            File was last accessed n*24 hours ago.      #
#                                                        #
##########################################################


IN1=/nas/quota/slot_2/CR*
IN2=/nas/quota/slot_3/CR*
IN3=/nas/quota/slot_4/CR*
IN4=/nas/quota/slot_5/CR*
OUT=/nas/quota/slot_3/CR_PRJ144/steve
mkdir ${OUT}
for dir in ${IN1}; do find $dir -atime +30 -exec ls -l '{}' \; >>${OUT}/30days.txt; done
for dir in ${IN2}; do find $dir -atime +30 -exec ls -l '{}' \; >>${OUT}/30days.txt; done
for dir in ${IN3}; do find $dir -atime +30 -exec ls -l '{}' \; >>${OUT}/30days.txt; done
for dir in ${IN4}; do find $dir -atime +30 -exec ls -l '{}' \; >>${OUT}/30days.txt; done
for dir in ${IN1}; do find $dir -atime +60 -exec ls -l '{}' \; >>${OUT}/60days.txt; done
for dir in ${IN2}; do find $dir -atime +60 -exec ls -l '{}' \; >>${OUT}/60days.txt; done
for dir in ${IN3}; do find $dir -atime +60 -exec ls -l '{}' \; >>${OUT}/60days.txt; done
for dir in ${IN4}; do find $dir -atime +60 -exec ls -l '{}' \; >>${OUT}/60days.txt; done
for dir in ${IN1}; do find $dir -atime +90 -exec ls -l '{}' \; >>${OUT}/90days.txt; done
for dir in ${IN2}; do find $dir -atime +90 -exec ls -l '{}' \; >>${OUT}/90days.txt; done
for dir in ${IN3}; do find $dir -atime +90 -exec ls -l '{}' \; >>${OUT}/90days.txt; done
for dir in ${IN4}; do find $dir -atime +90 -exec ls -l '{}' \; >>${OUT}/90days.txt; done
for dir in ${IN1}; do find $dir -atime +180 -exec ls -l '{}' \; >>${OUT}/180days.txt; done
for dir in ${IN2}; do find $dir -atime +180 -exec ls -l '{}' \; >>${OUT}/180days.txt; done
for dir in ${IN3}; do find $dir -atime +180 -exec ls -l '{}' \; >>${OUT}/180days.txt; done
for dir in ${IN4}; do find $dir -atime +180 -exec ls -l '{}' \; >>${OUT}/180days.txt; done
for dir in ${IN1}; do find $dir -atime +270 -exec ls -l '{}' \; >>${OUT}/270days.txt; done
for dir in ${IN2}; do find $dir -atime +270 -exec ls -l '{}' \; >>${OUT}/270days.txt; done
for dir in ${IN3}; do find $dir -atime +270 -exec ls -l '{}' \; >>${OUT}/270days.txt; done
for dir in ${IN4}; do find $dir -atime +270 -exec ls -l '{}' \; >>${OUT}/270days.txt; done
for dir in ${IN1}; do find $dir -atime +365 -exec ls -l '{}' \; >>${OUT}/365days.txt; done
for dir in ${IN2}; do find $dir -atime +365 -exec ls -l '{}' \; >>${OUT}/365days.txt; done
for dir in ${IN3}; do find $dir -atime +365 -exec ls -l '{}' \; >>${OUT}/365days.txt; done

for dir in ${IN4}; do find $dir -atime +365 -exec ls -l '{}' \; >>${OUT}/365days.txt; done
Kyle Brandt
  • 83,619
  • 74
  • 305
  • 448
user41612
  • 11
  • 2

3 Answers3

1

You need a fundamental redesign. You should only run the find command once over the entire system and create to create an index file that contains something like 'file:atime'. You can do this by using the -printf argument to find to print atime with the filename (see man find). You then can perform your operations based on that index. The reason for this is biggest penaltiy is going to be stating every file on disk, so you only want to do this once. This is the idea behind the locate and updatedb commands on Linux. Basically you want to recreate those with the addition of atime.

I also think looping over ls is crappy, You probably want to loop over the index line by line with a while loop. You are going to have to replicate truing those times into 'x days ago'. The easiest way will probably be to use the epoch. So you will end up with something like:

find ~/scrap -printf "%p:%A@%\n" > index;
while read -d':' name date; do
   if ...between dates using $date...; then
      do something to $name
   fi 
done < index

If you want instead of the above you have find command pipe to the while loop and have redirects to different files based on the if statements. Also, keep in mind : is a bad delimiter if it might be used in a filename.

You might want to generate the index in SQL if you want to get fancier.

Kyle Brandt
  • 83,619
  • 74
  • 305
  • 448
1

There are a couple of issues with the script which are causing it to run so slowly. First off your for loop is unnecessary the way it is written, with each variable only have one value, to use it they way you want too you'd change the structure to be something like:

IN_PATH="
/nas/quota/slot_2/CR*
/nas/quota/slot_3/CR*
/nas/quota/slot_4/CR*
/nas/quota/slot_5/CR*
"
OUT=/nas/quota/slot_3/CR_PRJ144/steve
mkdir ${OUT}
for dir in ${IN_PATH}; do find $dir +atime 30
for dir in ${IN_PATH}; do find $dir +atime 60
for dir in ${IN_PATH}; do find $dir +atime 90

etc..

But that still make find loop over the entire NAS filesystem stating every file... SLOW! Since we're checking atime we have to stat the files, but why not do it only once? Assuming you have a linux machine with standard GNU find you should be able to do something like this:

find /nas/quota/ \(+atime 365 -fls /root/365.txt\), \(+atime 180 -fls /root/180.txt\), etc...

Now I'm doing this from memory so it may need a bit f tweaking to work exactly right, test it on a web root, or you're home directory, something which runs fast to help trouble shoot. Find will accept multiple expressions and if you read through the precedence sections of the man page you can make it do some nifty things. depending on what you want to do with this info, you can also add limits to the atime checking, for instance:

\(+atime 180 -a -atime 364 -fls /root/more_than_180_but_less_than_365.txt \)
Ali Chehab
  • 451
  • 2
  • 5
  • The precedence with file output method looks like the way to go, especially combined with **Kyle's** `-printf` idea (using `-fprintf` instead). On my system, the commas caused errors, leaving them out worked as expected. – Dennis Williamson Apr 27 '10 at 18:34
  • Yeah, I wasn't 100% about the commas, needed for somethings and not for others. The -fls option is rudimentary just to get things going, printf lets you do a bunch of nifty things, especially when teamed with grep and xargs to do all sorts of clean up and tracking tasks on your data store. – Ali Chehab Apr 27 '10 at 19:30
0

If it takes days just to finish the portion of your search, it means your script is looping somewhere its not supposed to. On top of that you are doing recursive find over and over and over again.....

Also, please try to format your post. It looks like an unreadable block of text right now.

solefald
  • 2,301
  • 15
  • 14