3

I am experiencing a sporadic issue running containers on ACI that seems to cause Azure to "lose track" of my container instance and result in an orphaned container. My containers always run successfully, but every now and then I get this weird issue. Some peculiarities:

  • the container instance will still succeed internally (the code in it runs successfully), and the parent container group even says "Succeeded", but Azure never tells me the container instance itself has been created. It just says "Started". Typically the events you see are Pulling-->Pulled-->Created-->Started. Why is "Created" missing?
  • I can't view logs of the container without hooking up Azure Log Analytics. The "Logs" tab on the container blade in the Azure portal just says No logs available. Normally you can see the logs of a successful container
  • in cases of this issue occurring, it tries to pull the image twice (and appears to succeed twice - see image below).
  • sometimes there will be a 4th event displayed in the portal, "Killed"

enter image description here

I am creating a single-container container group via Logic Apps' Azure Container Instance connector - I do this reliably for many automated workflows. The logic app monitors the container group's state, and pulls the instance's logs and then deletes the group when done. All of my images are hosted on Azure Container Registry. The Python code inside the container pulls data from SQL, generates a PDF report, and posts it on an Azure Blob. I know the code is running/succeeding because I can see the report being posted! I have also hooked up Log Analytics to the container, so I can see my internal python logging. There are NO other errors I see reported by Log Analytics. I get a failure in the logic app though when I try to pull container logs and it can't find them (see bullet point 2 above).

Here's output from log analytics on container events (a more detailed version of above screenshot) - so bizarre that the container REPULLS 10 seconds after the first one successfully pulled. You can then see my first container actually runs successfully and exits with 0, and we then have this orphan container left over that is killed.

enter image description here

I have noticed one thing VERY consistent when this issue occurs. Typically when I look at a successful container creation event in Azure, the event message specifies that it is pulling my image via its tag: myregistry.azurecr.io/riptuskimage:1.2.5. When this issue occurs, the event messages specifies that the image is being pulled by its digest instead: myregistry.azurecr.io/riptuskimage@shah256:d98fja.... EVERY time the issue has occurred, I've noticed this. I have no idea why Azure is doing this. I most certainly specify the tag in my creation request.

I have viewed this post and this post and neither really help.

I've been scratching my head for a while on this one. The fact that it's sporadic (doesn't always happen), and when it does the images pull twice gives me the suspicion it has something to do with my container registry. The image I'm pulling is large - about 1.6GB. I checked the container registry's throttle limits and I don't think a single pull of a 1.6GB image should end up throttling - but the ACI container creation doesn't really give me a way to see if the registry is returning a 429 HTTP error. I'm not pulling anything else at that time.

Anyone have any ideas? Thanks!

Edit: This is a recent phenomenon! I have logic apps in place that have been creating containers for over a year, and this issue only starting occurring in the last few weeks (as of this posting 9/24/2021)

riptusk331
  • 369
  • 4
  • 9

1 Answers1

2

When your container is not working properly in Azure Container Instances, start by viewing its logs with az container logs, and stream its standard out and standard error with az container attach.

The az container attach command provides diagnostic information during container startup.

Also view the diagnostic information provided by the Azure Container Instances resource provider. To view the events for your container, run the az container show command.

With this your first problem will be solved. Check this document for more information.

Azure is pulling the container image twice from the Azure Container Registry because the container is taking a long time to start. Because Azure Container Instances pulls your container image on demand, the startup time you see is directly related to its size. One factor that contributes to container startup time in Azure Container Instances is Image Size.

Check this document for more information.

You can solve this problem by adding a delay after pulling your image from the registry in your logic app.

SauravDas-MT
  • 1,224
  • 1
  • 4
  • 10
  • You were correct. The heart of the issue was the **image size**. I reduced it and the issue abated. It would be nice if Azure specified this as the reason it's re-pulling, rather than leave you guessing. `az container show` doesn't tell you why it's re-pulling (just that it did). `az container logs` does not work. The az cli returns `'ContainerInstanceManagementClient' object has no attribute 'container'`. This echoes my experience in the Azure portal Logs blade displaying `No logs available`. Moot point though with the smaller image size resolving the double pull. – riptusk331 Sep 28 '21 at 03:08
  • Please check this discussion related to [az container logs doesn't work](https://github.com/Azure/azure-cli/issues/19475) for more insights. – SauravDas-MT Sep 28 '21 at 03:53