0

I have a kubernete microservice that can access Postgres, both in Azure. If I run the microservice container in my local and access the same Postgres in Azure, it has no problem. But if run the same container in AKS vm, the fetchall just hangs for hours or never return. It doesn't error out and no exception happens. If calling another endpoint in the microservice, I get expected response so the microservice is still running. However, the fetch works sometimes! But hanging happens more often. Below is the code I tried to use named cursor but not help.

For the Azure Postgres, I already bumped up to use 8-core, 100G, Optimized memory option. Also removed the kubernete POD memory limitation. The data records to retrieve from Postgres is around 8000 and each record is about couple hundreds bytes. When it hangs, I use psql inside the POD to do the same "select * tablename;" query, it returns immediately. Use fetchmany or fetch not working either. The psycopg2 is the 2.9.3

    select_query = "select * from database_table;"
    try:
        with connection:
            with connection.cursor("my_cursor_to_the_server_select_all") as cursor:
                cursor.itersize = 2000
                logger.debug("Before execute")
                cursor.execute(select_query)
                logger.debug("After execute")
                while True:
                    rows = cursor.fetchmany(200)  **<-- always stuck here for very long time**
                    logger.debug("fetch here")
                    if not rows:
                        break
                    count += 1
                    records.extend(rows)
                logger.debug(f"After fetch: {len(records)}")
    except Exception as err:
        logger.error(f"[{request_id}]: Exception in executing DB select_all: {err}")
        raise Exception(err)
    finally:
        connection.close()

Addition info: used htop to monitor performance, the CPU is memory usage is fine.

Edit:

  • I used the fetchmany and result is same. Sometimes works but mostly not.
  • The CPU usage from the container is about 2% and MEM is about 10%
  • When it "hangs", I use pgadmin to do the same query, it returns very quickly so the postgres is ok.
  • Found out that The "hanging" only happens when the REST endpoint is called from Azure Data Factory's Web Activity. If I call the REST API using REST client from local command line, the query can be performed normally. Wonder why this can happen?

select * from pg_stat_activity enter image description here

thsieh
  • 620
  • 8
  • 24
  • It looks like you are creating a server-side cursor - shouldn't you be using `cursor.fetchone` or `cursor.fetchmany` rather than `fetchall`? – snakecharmerb Aug 18 '22 at 19:46
  • 1
    Yeah `fetchall()` does not consider `itersize` or `arraysize` at all. `fetchmany()` would use [arraysize](https://www.psycopg.org/docs/cursor.html#cursor.arraysize). Or you could iterate directly over the cursor which would use `itersize`. It is spelled out in more detail here [Server side cursors](https://www.psycopg.org/docs/usage.html#server-side-cursors). – Adrian Klaver Aug 18 '22 at 20:45
  • I have tried with fetchmany and fetchone with server-side cursor. It is still same. Sometimes works but sometimes not. With client-side cursor (without the cursor name specified), it hangs in execute(). – thsieh Aug 19 '22 at 00:27
  • Can you use system tools to monitor if there is any CPU or network usage going on? (You said CPU usage is "fine", but what does that mean? Idle?) – jjanes Aug 19 '22 at 01:21
  • Anything in the db server's log file? What is the longest you have waited, or does it eventually timeout? – jjanes Aug 19 '22 at 01:25
  • I updated the question. The CPU is around 1~2% while MEM is like 10%. Kind of idle. Postgres doesn't seems to have issue as I can still perform queries. Also the Azure Postgres portal shows light activity. The only problem I found so far is the hanging happens when the RESTAPI endpoint is called from Azure Data Factory's Web activity. – thsieh Aug 19 '22 at 03:38
  • What is the state of the connection if you query `pg_stat_activity` while the code is hanging? – snakecharmerb Aug 20 '22 at 09:14
  • @snakecharmerb not sure if this is relevant: or what info I need to look into? datId datname query ===== =========== ====== "16384" "azure_sys" "" – thsieh Aug 26 '22 at 23:28

0 Answers0