2

Generic question

My python Cloud Function raises about 0.05 memory error per second - it is invoked about 150 times per second. I get the feeling my function leaves memory residuals behind, which causes its instances to crash once they have dealt with many requests. What are the things you should do or not do so that your function instance doesn't eat "a bit more of its allocated memory" each time it's called ? I've been pointed to the docs to learn that I should delete all temporary files as this is writing in memory but I don't think I've written any.

More context

The code of my function can be summed up as the following.

  • Global context: Grab a file on Google Cloud Storage containing a list of known User-Agents of bots. Instantiate an Error Reporting client.
  • If User-Agent identifies a bot, return a 200 code. Else parse the arguments of the request, rename them, format them, timestamp the reception of the request.
  • Send the resulting message to Pub/Sub in a JSON string.
  • Return a 200 code

I believe my instances are gradually consuming all memory available because of this graph I've done in Stackdriver:

Memory usage of Cloud Function instances

It is a heat map of the memory usage across my Cloud function's instances, red and yellow indicating that most of my function instances are consuming this range of memory. Because of the cycle that seems to appear, I interpreted it as a gradual fill-up of my instances' memory, until they crash and new instances are spawned. This cycle remains if I raise the memory allocated to the function, it just raises the upper bound of memory usage the cycle follows.

Edit: Code excerpt and more context

The requests contain parameters that help implement tracking on an ecommerce website. Now that I copy it, there might be an anti-pattern where I modify form['products'] while iterating over it, but I don't think it would have anything to do with memory waste ?

from json import dumps
from datetime import datetime
from pytz import timezone

from google.cloud import storage
from google.cloud import pubsub
from google.cloud import error_reporting

from unidecode import unidecode

# this is done in global context because I only want to load the BOTS_LIST at
# cold start
PROJECT_ID = '...'
TOPIC_NAME = '...'
BUCKET_NAME = '...'
BOTS_PATH = '.../bots.txt'
gcs_client = storage.Client()
cf_bucket = gcs_client.bucket(BUCKET_NAME)
bots_blob = cf_bucket.blob(BOTS_PATH)
BOTS_LIST = bots_blob.download_as_string().decode('utf-8').split('\r\n')
del cf_bucket
del gcs_client
del bots_blob

err_client = error_reporting.Client()


def detect_nb_products(parameters):
    '''
    Detects number of products in the fields of the request.
    '''
    # ...


def remove_accents(d):
    '''
    Takes a dictionary and recursively transforms its strings into ASCII
    encodable ones
    '''
    # ...


def safe_float_int(x):
    '''
    Custom converter to float / int
    '''
    # ...


def build_hit_id(d):
    '''concatenate specific parameters from a dictionary'''
    # ...


def cloud_function(request):
    """Actual Cloud Function"""
    try:
        time_received = datetime.now().timestamp()
        # filtering bots
        user_agent = request.headers.get('User-Agent')
        if all([bot not in user_agent for bot in BOTS_LIST]):
            form = request.form.to_dict()
            # setting the products field
            nb_prods = detect_nb_products(form.keys())
            if nb_prods:
                form['products'] = [{'product_name': form['product_name%d' % i],
                                     'product_price': form['product_price%d' % i],
                                     'product_id': form['product_id%d' % i],
                                     'product_quantity': form['product_quantity%d' % i]}
                                    for i in range(1, nb_prods + 1)]

            useful_fields = [] # list of keys I'll keep from the form
            unwanted = set(form.keys()) - set(useful_fields)
            for key in unwanted:
                del form[key]

            # float conversion
            if nb_prods:
                for prod in form['products']:
                    prod['product_price'] = safe_float_int(
                        prod['product_price'])

            # adding timestamp/hour/minute, user agent and date to the hit
            form['time'] = int(time_received)
            form['user_agent'] = user_agent
            dt = datetime.fromtimestamp(time_received)
            form['date'] = dt.strftime('%Y-%m-%d')

            remove_accents(form)

            friendly_names = {} # dict to translate the keys I originally
            # receive to human friendly ones
            new_form = {}
            for key in form.keys():
                if key in friendly_names.keys():
                    new_form[friendly_names[key]] = form[key]
                else:
                    new_form[key] = form[key]
            form = new_form
            del new_form

            # logging
            print(form)

            # setting up Pub/Sub
            publisher = pubsub.PublisherClient()
            topic_path = publisher.topic_path(PROJECT_ID, TOPIC_NAME)
            # sending
            hit_id = build_hit_id(form)
            message_future = publisher.publish(topic_path,
                                               dumps(form).encode('utf-8'),
                                               time=str(int(time_received * 1000)),
                                               hit_id=hit_id)
            print(message_future.result())

            return ('OK',
                    200,
                    {'Access-Control-Allow-Origin': '*'})
        else:
        # do nothing for bots
            return ('OK',
                    200,
                    {'Access-Control-Allow-Origin': '*'})
    except KeyError:
        err_client.report_exception()
        return ('err',
                200,
                {'Access-Control-Allow-Origin': '*'})
totooooo
  • 1,050
  • 1
  • 12
  • 32
  • Are you facing any issue because of the crashes? Cloud Functions should automatically spawn up new instances. – Kannappan Sirchabesan Jan 20 '19 at 22:27
  • Also, how are you grabbing the file and opening it? Are you closing the file? – Kannappan Sirchabesan Jan 20 '19 at 22:29
  • 3
    Without seeing all the relevant code, it's not really possible to say anything specific. All that can be offered is to delete temp files and don't store anything at all in global memory, anywhere, unless it's under your full control in terms of size. – Doug Stevenson Jan 20 '19 at 23:05
  • Are you using tempfile.mkstemp? if so, please, take a look at this: https://www.logilab.org/blogentry/17873. Also, take a look at these best practices: https://cloud.google.com/functions/docs/bestpractices/tips – Alex Riquelme Jan 21 '19 at 10:34
  • Just added the code. @Kannappan As the file is fetched from connection in global context I don't think I would have to close anything ? I ain't got "issue" really because of this, but I'm curious of the possibility for no memory errors at all, and I don't like the idea that some of my events might not get treated. – totooooo Jan 21 '19 at 20:30
  • @DougStevenson does this mean that loading my list of bots in global context might be a bad idea ? – totooooo Jan 22 '19 at 10:03
  • If you aren't in full control of its size, and have allocated enough for it with respect to the needs of the rest of your function, it's a bad idea. It's probably better to lazy init any globals within the function body rather than try to write arbitrary code in the global space. – Doug Stevenson Jan 22 '19 at 19:36
  • @DougStevenson just tried without any global context (except for imports), placing all of the BOTS_LIST loading inside the function. Unfortunately this didn't change the situation – totooooo Jan 25 '19 at 10:18
  • If there is no way to optimize your code, you can divide your code into different Cloud Functions, or select more RAM for your Cloud Function (you can choose up to 2GB). – Pablo Almécija Rodríguez Feb 05 '19 at 09:02

1 Answers1

1

There are a few things you could try (theoretical answer, I didn't play with CFs yet):

  • explicitly delete the temporary variables that you allocate on the bots processing path, which may be referencing each-other thus preventing the memory garbage collector from freeing them (see https://stackoverflow.com/a/33091796/4495081): nb_prods, unwanted, form, new_form, friendly_names, for example.

  • if unwanted is always the same make it a global instead.

  • delete form before re-assigning it to new_form (the old form object remains); also deleting new_form won't actually save much since the object remains referenced by form. I.e. change:

        form = new_form
        del new_form
    

    into

        del form
        form = new_form
    
  • explicitly invoke the memory garbage collector after you publish your topic and before returning. I'm unsure if that's applicable to CFs or if the invocation is immediately effective or not (for example in GAE it's not, see When will memory get freed after completing the request on App Engine Backend Instances?). This may also be overkill and potentially hurt you CF's performance, see if/how it works for you.

    gc.collect()
    
Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97