Generic question
My python Cloud Function raises about 0.05 memory error per second - it is invoked about 150 times per second. I get the feeling my function leaves memory residuals behind, which causes its instances to crash once they have dealt with many requests. What are the things you should do or not do so that your function instance doesn't eat "a bit more of its allocated memory" each time it's called ? I've been pointed to the docs to learn that I should delete all temporary files as this is writing in memory but I don't think I've written any.
More context
The code of my function can be summed up as the following.
- Global context: Grab a file on Google Cloud Storage containing a list of known User-Agents of bots. Instantiate an Error Reporting client.
- If User-Agent identifies a bot, return a 200 code. Else parse the arguments of the request, rename them, format them, timestamp the reception of the request.
- Send the resulting message to Pub/Sub in a JSON string.
- Return a 200 code
I believe my instances are gradually consuming all memory available because of this graph I've done in Stackdriver:
It is a heat map of the memory usage across my Cloud function's instances, red and yellow indicating that most of my function instances are consuming this range of memory. Because of the cycle that seems to appear, I interpreted it as a gradual fill-up of my instances' memory, until they crash and new instances are spawned. This cycle remains if I raise the memory allocated to the function, it just raises the upper bound of memory usage the cycle follows.
Edit: Code excerpt and more context
The requests contain parameters that help implement tracking on an ecommerce website. Now that I copy it, there might be an anti-pattern where I modify form['products']
while iterating over it, but I don't think it would have anything to do with memory waste ?
from json import dumps
from datetime import datetime
from pytz import timezone
from google.cloud import storage
from google.cloud import pubsub
from google.cloud import error_reporting
from unidecode import unidecode
# this is done in global context because I only want to load the BOTS_LIST at
# cold start
PROJECT_ID = '...'
TOPIC_NAME = '...'
BUCKET_NAME = '...'
BOTS_PATH = '.../bots.txt'
gcs_client = storage.Client()
cf_bucket = gcs_client.bucket(BUCKET_NAME)
bots_blob = cf_bucket.blob(BOTS_PATH)
BOTS_LIST = bots_blob.download_as_string().decode('utf-8').split('\r\n')
del cf_bucket
del gcs_client
del bots_blob
err_client = error_reporting.Client()
def detect_nb_products(parameters):
'''
Detects number of products in the fields of the request.
'''
# ...
def remove_accents(d):
'''
Takes a dictionary and recursively transforms its strings into ASCII
encodable ones
'''
# ...
def safe_float_int(x):
'''
Custom converter to float / int
'''
# ...
def build_hit_id(d):
'''concatenate specific parameters from a dictionary'''
# ...
def cloud_function(request):
"""Actual Cloud Function"""
try:
time_received = datetime.now().timestamp()
# filtering bots
user_agent = request.headers.get('User-Agent')
if all([bot not in user_agent for bot in BOTS_LIST]):
form = request.form.to_dict()
# setting the products field
nb_prods = detect_nb_products(form.keys())
if nb_prods:
form['products'] = [{'product_name': form['product_name%d' % i],
'product_price': form['product_price%d' % i],
'product_id': form['product_id%d' % i],
'product_quantity': form['product_quantity%d' % i]}
for i in range(1, nb_prods + 1)]
useful_fields = [] # list of keys I'll keep from the form
unwanted = set(form.keys()) - set(useful_fields)
for key in unwanted:
del form[key]
# float conversion
if nb_prods:
for prod in form['products']:
prod['product_price'] = safe_float_int(
prod['product_price'])
# adding timestamp/hour/minute, user agent and date to the hit
form['time'] = int(time_received)
form['user_agent'] = user_agent
dt = datetime.fromtimestamp(time_received)
form['date'] = dt.strftime('%Y-%m-%d')
remove_accents(form)
friendly_names = {} # dict to translate the keys I originally
# receive to human friendly ones
new_form = {}
for key in form.keys():
if key in friendly_names.keys():
new_form[friendly_names[key]] = form[key]
else:
new_form[key] = form[key]
form = new_form
del new_form
# logging
print(form)
# setting up Pub/Sub
publisher = pubsub.PublisherClient()
topic_path = publisher.topic_path(PROJECT_ID, TOPIC_NAME)
# sending
hit_id = build_hit_id(form)
message_future = publisher.publish(topic_path,
dumps(form).encode('utf-8'),
time=str(int(time_received * 1000)),
hit_id=hit_id)
print(message_future.result())
return ('OK',
200,
{'Access-Control-Allow-Origin': '*'})
else:
# do nothing for bots
return ('OK',
200,
{'Access-Control-Allow-Origin': '*'})
except KeyError:
err_client.report_exception()
return ('err',
200,
{'Access-Control-Allow-Origin': '*'})