1

I have an existing jenkins job that kicks off a shell script to copy my prod environment into qa.

We added a lot of data to prod (gzip dump went from 2gig to 15gig) and all of the sudden my jenkins jobs started failing.

We are running postgres 9.5 in aws and jenkins 2.171. all jenkins jobs are executed on master which is the same server with 6 executors. There are no memory/cpu/disk space issues

Tried a few things: statement_timeout on the postgres instance is already 0. Switching from bash to sh for some reason helped on some scripts but not others. In particular this one is still having various psql statements Killed. the script works fine when run from an interactive shell.

Also tried disabling Process Tree Killer https://wiki.jenkins.io/display/JENKINS/ProcessTreeKiller. no go.

Here's the code from two of the more innocuous commands that should run pretty quickly. $POSTGRES_HOST_OPTS only has the db name and port:

echo -e "Running POSTGIS command"
psql $POSTGRES_HOST_OPTS -U $POSTGRES_ENV_POSTGRES_USER_PROD -d postgres -c "CREATE EXTENSION postgis;"

echo -e "Creating temporary user dv3_qa_tmp so we can rename the $POSTGRES_ENV_POSTGRES_USER_PROD user\n"
psql $POSTGRES_HOST_OPTS -U $POSTGRES_ENV_POSTGRES_USER_PROD -d postgres -c "create role dv3_qa_tmp password '$PGPASSWORD_QA' createdb createrole inherit login;"

Here's the output from jenkins console:

Waiting for new instance to be available...
-e Renaming database dv3_prod to dv3_qa 

Killed
-e Running POSTGIS command
Killed
-e Creating temporary user dv3_qa_tmp so we can rename the dv3_prod_user user

Killed
-e Renaming user dv3_prod_user to dv3_qa_user 

Killed
Killed
-e 
All done

From the jenkins.log there is something on file descriptors but not sure how that is related. I've also tried redirecting stderr which gets rid of this message but doesn't stop the commands being killed.

Apr 10, 2019 4:23:31 PM hudson.Proc$LocalProc join
WARNING: Process leaked file descriptors. See https://jenkins.io/redirect/troubleshooting/process-leaked-file-descriptors for more information
java.lang.Exception
        at hudson.Proc$LocalProc.join(Proc.java:334)
        at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
        at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
        at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
        at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
        at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
        at hudson.model.Build$BuildExecution.build(Build.java:206)
        at hudson.model.Build$BuildExecution.doRun(Build.java:163)
        at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
        at hudson.model.Run.execute(Run.java:1818)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
        at hudson.model.ResourceController.execute(ResourceController.java:97)
        at hudson.model.Executor.run(Executor.java:429)
Ted
  • 11
  • 1
  • Ok I somewhat solved it. looks like the issue is that the jenkins user cannot run perl scripts. when /usr/bin/perl is invoked it immediately gets "Killed". This is relevant because psql is a symlink to a perl script. Now the question is why the jenkins user can't run perl?!? – Ted Apr 12 '19 at 19:57

0 Answers0