Latency between 2 Google Cloud run Instances

Question

I have 2 Google Cloud run instances communicating using REST API calls.

One micro-service is in NodeJS (lets call this as MS1) and the other is in python (lets call this as MS2).
MS1 uses axios POST call to send the data to MS2, upon reception of the data MS2 performs operations and send the result back to MS1 using requests post call.

Whenever there is a need to send more number of requests from MS1 to MS2 we see that there is significant delay in reception of the data at MS2.
However we are not able to exactly figure out where this delay is coming from.

Some of the things we investigated are given below:

Cold start time for MS2 is showing around 10 seconds, however in our case we are seeing a delay of 150 seconds in some cases.
We also increased the number of instances to make sure there is no limitation set by us to spin up new instances during high load.

It would be really great, if you could guide me in the right direction.

Update 1:

Here is the Dockerfile for the python micro-service (MS2).

FROM ubuntu:20.04

RUN apt-get update
RUN apt-get install -y python3-dev 
RUN apt-get install -y python3-pip
RUN apt-get install -y python3-venv
RUN apt-get install -y python3-six

RUN python3 -m pip install Flask==2.2.3
RUN python3 -m pip install waitress==2.1.2
RUN python3 -m pip install numpy==1.24.2
RUN python3 -m pip install pandas==1.5.3
RUN python3 -m pip install matplotlib==3.7.1
RUN python3 -m pip install requests==2.28.2
RUN python3 -m pip install protobuf==3.19.0

# Create and change to the app directory.
WORKDIR /usr/src/app

# Copy local code to the container image.
COPY . ./

# Run the web service on container startup.
CMD [ "python3", "run.py" ]

Update 2:

Here is the code snippet from the python micro-service (MS2).

from flask import Flask, request, jsonify
from waitress import serve
import pandas as pd
import time
import logging
import json
import internal_function_1
import internal_function_2
import internal_function_3
import os
import requests

# Flask constructor takes the name of
# current module (__name__) as argument.
app = Flask(__name__)

@app.route('/start', methods=['POST'])
# ‘/’ URL is bound with hello_world() function.
def start():
    # process the data received 
    json_data = process_input()
    
    target_endpoint = "MS1/endpoint"
    headers = {'Content-Type': 'application/json'}
    response = requests.post(url=target_endpoint,
                             data=json_data, headers=headers)
                             
    return json_data

Thank you,
KK

Are you able to measure the latency? (out of cold start situation) — guillaume blaquiere, May 13 '23 at 08:00
I measured the difference between the timestamp at which the data was posted, and the timestamp at which the data was received. From cold start metrics, I could see that the latency was around 10 seconds. — KK2491, May 13 '23 at 08:23
Significantly long cold starts are usually caused by your container's behavior. Since you have not provided details on what your container performs during startup, we can only guess. Edit your post. Include the deployment command, Dockerfile, and relevant sections of your code involved in application initialization plus global space. Include enough so that your problem can be reproduced. — John Hanley, May 13 '23 at 08:50
I have updated the `Dockefile` details, there are few package installation is being done in the python micro-service. — KK2491, May 13 '23 at 11:00
Containers do not (usually) affect cold start time. The actions your container performs during the start are the issue. Please post the details from my last comment. — John Hanley, May 13 '23 at 11:07
I have updated the code snippet, mainly it contains the initialization and import steps. And regarding the deployment, we use very basic deployment procedure, deploy and update the traffic. Please let me know if you need more details. — KK2491, May 13 '23 at 11:22
Can you share your configuration such as minimum & maximum number of instances and concurrent request per instance? — Roopa M, May 14 '23 at 09:01

Latency between 2 Google Cloud run Instances

0 Answers0