GOAL : compare the size of our directory structure day over day. The data folder has over 990tb in it, so I had to run a bunch of parallel du's in order for it to finish in a reasonable amount of time. Occassionally we see large data growth very quickly, and currently don't have a good way of seeing where the data was added.
PROBLEM : The $1 and $2 of my awk aren't output anything, and the single quotes that should surround them also aren't showing up.
PRE-EMPTIVE STRIKE : I know there are WAY better tools to find this information, and we are working on implementing them. This is meant to be a quick and dirty band-aid to allow us to address any rapid growth that happens until the proper monitoring software is in place. Also, the delimiter between the two values in the log file is a tab. Pasting it into this WYSIWYG editor converted the tab to spaces.
Thanks in advance for any help you can provide me!
Snarf
I am attempting to do the following (psuedo code)
- Find folders two levels deep in our data folder
- Create a directory structure in a temp folder that mirrors the data folder structure
- Create log files in the temp folder structure for each of the folders found in the "find folders"
- Fill those log files with the output of a du -s of the folders from the "find folders"
- awk the log files and build sql inserts
- once the sql inserts look right, I will pipe the awk to mysql
- once the data exists in mysql, it'll be easy to query day over day stats
Script -
DT=`date +"%Y%m%d"`
BASE=/mnt/data/test/
find /mnt/data -maxdepth 2 -mindepth 2 -type d -exec sh -c 'mkdir -p "$(dirname '"$BASE$DT"'{}.log)";touch '"$BASE$DT"'{}.log; du -S {} > '"$BASE$DT"'{}.log; awk -F'\''\t'\'' '\''{print "INSERT INTO DATE'"$DT"'(folder_size, folder_location) VALUES('\''$1'\'', '\''$2'\'');"}'\'' '"$BASE$DT"'{}.log' \;
Example of log file -
0 /mnt/data/apps/bog/minio.production-config/.minio/certs/CAs
12 /mnt/data/apps/bog/minio.production-config/.minio/certs
1 /mnt/data/apps/bog/minio.production-config/.minio
1 /mnt/data/apps/bog/minio.production-config
Example of output from the script for this log file -
INSERT INTO DATE20220508(folder_size, folder_location) VALUES(, );
INSERT INTO DATE20220508(folder_size, folder_location) VALUES(, );
INSERT INTO DATE20220508(folder_size, folder_location) VALUES(, );
INSERT INTO DATE20220508(folder_size, folder_location) VALUES(, );