2

We are using docker-compose to run some api tests. In the background, the API performs CRUD operations on a cosmsodb. The test run is supposed to run without creating and using a real cosmosdb, so i use the cosmosdb emulator as a docker image.

version: "3.7"
services:
  cosmosdb:
    image: mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator
    container_name: cosmosdb
    environment:
      - AZURE_COSMOS_EMULATOR_PARTITION_COUNT=10
      - AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE=true
    healthcheck:
      test:
        [
          "CMD",
          "curl",
          "-f",
          "-k",
          "https://localhost:8081/_explorer/emulator.pem",
        ]
      interval: 10s
      timeout: 1s
      retries: 5
    ports:
      - "8081:8081"
  init:
    build:
      context: init
    depends_on:
      cosmosdb:
        condition: service_healthy

The script even has a loop to see if the db is ready, before writing anything. It works roughly like this:

const client = new CosmosClient({
  endpoint: `https://cosmosdb:8081/`,
  key: 'C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==',
  connectionPolicy: { requestTimeout: 10000 },
});

async function isDbReady(): Promise<boolean> {
  try {
    await client.databases.readAll().fetchAll();
    return true;
  } catch (err) {
    console.log('database not ready', err.message);
    return false;
  }
}

async function waitForDb(): Promise<void> {
  while(!await isDbReady()) {
    await new Promise((resolve) => setTimeout(resolve, 10000));
  }
}

the problem, that we have is, when our script (javascript with @azure/cosmos) is trying to create a database, a couple of collections and then inserting a couple of items, the cosmosdb sometimes (about 20% of the tests) will just stop responding and run into timeouts. This will persist until we run docker-compose down and rerun docker-compose up for the next try.

We run a slightly modified version of the image, where we just installed curl for the healthcheck to run (the same issue happens when directly using mcr.microsoft.com/cosmosdb/linux/azure-cosmos-emulator which is why i simple added that image to the docker compose snippet). We use the healthcheck for cosmosdb emulator as suggested here: How to check if the Cosmos DB emulator in a Docker container is done booting?

Defining a docker-compose volume and mounting it as /tmp/cosmos/appdata also won't improve the situation.

We are also not sure how to set AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE as we actually would like to start each test run with a clean database and have our script insert data.

How can we get the cosmosdb emulator to be more stable?

Woozar
  • 1,000
  • 2
  • 11
  • 35
  • Have you been using this pipeline for some time? Was it always unstable or did it recently become so? – Matias Quaranta Feb 03 '23 at 18:08
  • I was using it on my local system to develop the stuff and there if worked "ok". I had to restart it from time to time. After increasing the count of requests, the problem became more severe. When running the whole docker-compose on a "not so strong" build bot during pipeline execution, it failed from the very beginning. – Woozar Feb 06 '23 at 11:40
  • Can you monitor the CPU usage of the machine that is running it? This sounds like resource exhaustion on the machine running the Emulator. – Matias Quaranta Feb 06 '23 at 14:27
  • that is an azure pipeline with azure pipeline runners - either it is not monitorable or i don't have permissions. So sadly, this is a black box except for the cli commands in the pipeline script. I did a df -h after the execution to check if the docker volumes became too large, but it seems that disks are not full – Woozar Feb 07 '23 at 15:51

0 Answers0