3

This will be a quite long post, detailing my problems and the results of my research at the moment.

Context:

Currently I have a Rundeck on a GCP server with python scripts running on it. Python scripts are used to interact with other services using rest-API.

Problem:

Theses scripts are running well and fast (under 20sec) but sometimes it takes 4 or 5 minutes to run. This problem happen in every scripts.

When a slow script happen, it touch every script on the period of time:

Script9 20/03/2019 16:50 en 3 s
Script8 20/03/2019 16:49 en 4 m50 s
Script7 20/03/2019 16:49 en 4 m45 s
Script5 20/03/2019 16:49 en 4 m43 s
Script4 20/03/2019 16:49 en 4 m33 s
Script3 20/03/2019 16:48 en 2 m1 s
Script2 20/03/2019 16:48 en 2 m1 s
Script1 20/03/2019 16:48 en 2 m1 s
Script8 20/03/2019 16:44 en 3 s

In order to have more information about what slows the script, I've done a -m cProfile -s tottime and it shows that the slowest function is {built-in method _openssl.ENGINE_init} cProfile result

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1       216.560  216.560  216.560  216.560 {built-in method _openssl.ENGINE_init}
1       10.005   10.005   10.005   10.005 {built-in method time.sleep}
43       2.121    0.049    2.121    0.049 {method 'read' of '_ssl._SSLSocket' objects}
12       0.449    0.037    0.449    0.037 {method 'do_handshake' of '_ssl._SSLSocket' objects}
488      0.267    0.001    0.267    0.001 {built-in method marshal.loads}
18       0.206    0.011    0.206    0.011 {method 'connect' of '_socket.socket' objects}
12       0.110    0.009    0.110    0.009 {method 'load_verify_locations' of '_ssl._SSLContext' objects}
18       0.064    0.004    0.064    0.004 {built-in method _socket.getaddrinfo}
1630/1617 0.045    0.000    0.154    0.000 {built-in method builtins.__build_class__}
3799     0.041    0.000    0.041    0.000 {built-in method posix.stat}
529/1    0.035    0.000  230.937  230.937 {built-in method builtins.exec}

It looks like the problem comes from _openssl, used by the request library.

What I tried:

In order to solve this problem I've looked on google and found nothing solving my problem. I've tried few things:

  • Upgrading the GCP server (new config in other information)
  • Upgrading the rundeck JVM memory
  • Changing the seconds on the jobs time to avoid script starting at the exact same second
  • Monitoring the rundeck and found that the tcp_time_wait increase when the problem happens graph

None of theses things solved the problem.

I also tried to reproduce the problem by hand but even when I launch multiples scripts from the rundeck in parallel the problem didn't shows up.

Your time to shine:

If you already faced this problem, or if you have an idea of what I can try, please let me know.

Thank you for reading this.

Other information:

Rundeck launch scripts using job with this:

#!/bin/bash
set -e
source /root/venv/global/bin/activate
export PYTHONPATH=${PYTHONPATH}:/root/repo/Script
python /root/repo/Script/folder/scriptname.py 
deactivate

Python (3.5) libs and their versions in the venv:

argh==0.26.2
asn1crypto==0.24.0
bcrypt==3.1.6
cachetools==3.0.0
certifi==2018.11.29
cffi==1.11.5
chardet==3.0.4
Click==7.0
cryptography==2.5
Deprecated==1.2.4
emoji==0.5.1
et-xmlfile==1.0.1
future==0.17.1
get==2018.11.19
google-api-core==1.7.0
google-api-python-client==1.7.7
google-auth==1.6.2
google-auth-httplib2==0.0.3
google-cloud-core==0.29.1
google-cloud-logging==1.10.0
googleapis-common-protos==1.5.6
grpcio==1.18.0
httplib2==0.12.0
idna==2.8
jdcal==1.4
mysql-connector==2.1.4
numpy==1.16.0
oauth2client==4.1.3
openpyxl==2.5.14
pandas==0.23.4
paramiko==2.4.2
pathtools==0.1.2
Pillow==5.4.1
post==2018.11.20
protobuf==3.6.1
public==2018.11.20
pyasn1==0.4.5
pyasn1-modules==0.2.3
pycparser==2.19
PyGithub==1.43.4
PyJWT==1.7.1
PyNaCl==1.3.0
pyodbc==4.0.25
python-dateutil==2.7.5
python-resize-image==1.1.18
pytz==2018.9
PyYAML==3.13
query-string==2018.11.20
request==2018.11.20
requests==2.21.0
rsa==4.0
salesforce-bulk-api==1.2.0
simple-salesforce==0.74.2
six==1.12.0
unicodecsv==0.14.1
uritemplate==3.0.0
urllib3==1.24.1
watchdog==0.9.0
wrapt==1.11.1
xmltodict==0.11.0

GCP serveur config:

  • name: n1-standard-4
  • vCPUs: 4
  • Memory: 15
  • Max number of persistent disks: 128
  • Max total PD size: 64
Nathan A
  • 31
  • 2

0 Answers0