is there is a way to automate the redshift vaccum process through udf?

Question

I have more that 300+ table in redshift.

Data is getting update daily basic just want to know can i create a udf in redshift to automate the vaccum process.

I found a link automate using python but not that great python coder i am so looking for solution in sql script.

michael_erasmus · Answer 1 · 2017-11-30T16:20:49.837

Unfortunately, you can't use a udf for something like this, udf's are simple input/ouput function meant to be used in queries.

Your best bet is to use this open source tool from AWS Labs: VaccumAnalyzeUtility. The great thing about using this tool is that it is very smart about only running VACUUM on tables that need them, and it will also run ANALYZE on tables that need it.

It's pretty easy to set up as cron job. Here is an example of how it can be done:

Pull the amazon-redshift-utils repo in git:

git clone https://github.com/awslabs/amazon-redshift-utils
cd amazon-redshift-utils

Create a script that can be run by cron. In your text editor create a file called run_vacuum_analyze.sh with the following, and fill in the values for the your environment:

export REDSHIFT_USER=<your db user name>
export REDSHIFT_PASSWORD=<your db password>
export REDSHIFT_DB=<your db>
export REDSHIFT_HOST=<your redshift host>
export REDSHIFT_PORT=<your redshift port>
export WORKSPACE=$PWD/src/AnalyzeVacuumUtility 

#
# VIRTUALENV
#

rm -rf $WORKSPACE/ve1
virtualenv -p python2.6 "$WORKSPACE/ve1"
# enter virutalenv
source $WORKSPACE/ve1/bin/activate

#
# DEPENDENCIES
#
pip install PyGreSQL

cd $WORKSPACE/run

#
# RUN IT
#
python analyze-vacuum-schema.py  --db $REDSHIFT_DB --db-user $REDSHIFT_USER --db-pwd $REDSHIFT_PASSWORD --db-port $REDSHIFT_PORT --db-host $REDSHIFT_HOST

Then create a cron job that will run this script (In this example, I run it daily at 2:30 AM)

chmod +x run_vacuum_analyze.sh
crontab -e

Add the following entry:

30 2 * * * <path-to-the-cloned-repo>/run_vacuum_analyze.sh

Thanks for the info ,Can you please share the documents if you have step by step approach.Thanks a lot for your help and support — diptiranjan pradhan, Nov 30 '17 at 19:04
Can I create a shell script and run as cron job vaccum Table1; vaccum Table2; vaccum Table3; vaccum Table4; . . . . . Vaccum TableN; — diptiranjan pradhan, Dec 03 '17 at 08:23

score 0 · Answer 2 · answered Nov 30 '17 at 10:43

0

You CANNOT use a UDF for this, UDFs cannot run command that update data.

answered Nov 30 '17 at 10:43

Jon Scott

4,144
17
29

Any other suggestion ,I appreciate your help – diptiranjan pradhan Nov 30 '17 at 11:27
create a long sql script with all your tables? – Jon Scott Nov 30 '17 at 11:33

score 0 · Answer 3 · answered Nov 01 '18 at 18:09

0

Yes, I have created a AWS lamda function in java and used cloud watch event to schedule using a cron. AWS lamda function in java expects shaded jar to be uploaded. I have created environment variable in lamda function for redshift connection properties which are passed into java handler.

answered Nov 01 '18 at 18:09

Mallikarjun

129
2
11

you can download the java shaded java project from here and build using mvn to generate shaded jar. The Main class has a list that takes all table names that needs vacuum and analyze. [link] (https://drive.google.com/open?id=1iF5i3cBPEJT6hIdCpkmy3C_KruxsoGw7) – Mallikarjun Nov 01 '18 at 18:18

score 0 · Answer 4 · answered Dec 05 '19 at 08:40

0

Now you can use auto vaccum ,Redshift now providing this option

answered Dec 05 '19 at 08:40

diptiranjan pradhan

81
3
12

score 0 · Answer 5 · answered Apr 26 '20 at 03:21

Here is my shell script utility to automate this with better control over the table filters.

https://thedataguy.in/automate-redshift-vacuum-analyze-using-shell-script-utility/

Example Commands:

Run vacuum and Analyze on all the tables.

./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev

Run vacuum and Analyze on the schema sc1, sc2.

./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -s 'sc1,sc2'

Run vacuum FULL on all the tables in all the schema except the schema sc1. But don’t want Analyze

./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -k sc1 -o FULL -a 0 -v 1

or

./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -k sc1 -o FULL -a 0

Run Analyze only on all the tables except the tables tb1,tbl3.

./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -b 'tbl1,tbl3' -a 1 -v 0

or 

./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -b 'tbl1,tbl3' -v 0

Use a password on the command line.

./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -P bhuvipassword

Run vacuum and analyze on the tables where unsorted rows are greater than 10%.

./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -v 1 -a 1 -x 10

or

./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -x 10

Run the Analyze on all the tables in schema sc1 where stats_off is greater than 5.

./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -v 0 -a 1 -f 5

Run the vacuum only on the table tbl1 which is in the schema sc1 with the Vacuum threshold 90%.

./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -s sc1 -t tbl1 -a 0 -c 90

Run analyze only the schema sc1 but set the analyze_threshold_percent=0.01

./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -s sc1 -t tbl1 -a 1 -v 0 -r 0.01

Do a dry run (generate SQL queries) for analyze all the tables on the schema sc2.

./vacuum-analyze-utility.sh -h endpoint -u bhuvi -d dev -s sc2 -z 1

is there is a way to automate the redshift vaccum process through udf?

5 Answers5

Example Commands: