1

So I am running Django App in kubernetes pod at when trying to save an image file:

img_obj.image.save(img_file, File(img_file_org))

I am getting no space left error:

  File "/code/ocr_client/management/commands/pdf_to_image.py", line 126, in handle
    img_obj.image.save(img_file, File(img_file_org))
  File "/opt/conda/lib/python3.7/site-packages/django/db/models/fields/files.py", line 88, in save
    self.name = self.storage.save(name, content, max_length=self.field.max_length)
  File "/opt/conda/lib/python3.7/site-packages/django/core/files/storage.py", line 54, in save
    return self._save(name, content)
  File "/opt/conda/lib/python3.7/site-packages/django/core/files/storage.py", line 274, in _save
    fd = os.open(full_path, self.OS_OPEN_FLAGS, 0o666)
OSError: [Errno 28] No space left on device: '/code/pca_back_data/media/file1.png'

I already ran

kubectl exec <my-pod> -- df -ah

And there is still 20% of the space left (100GB)

I also ran as suggested in other thread:

kubectl exec <my-pod> -- df -hi

and the usage of inodes was only 5%

I am not sure what else might be the issue here? Is there some config in Kubernetes that restricts usage of storage for a pod/process?

Alex T
  • 3,529
  • 12
  • 56
  • 105
  • Are you saving the file in the container temporary filesystem, or somewhere else? Kubernetes calls this "ephemeral storage" and it can be limited with `resources:`, though IME that's a little unusual. – David Maze Jun 28 '23 at 10:31

2 Answers2

2

If you are getting the "No space left on device" error even when the disk usage and inode usage are low, it might be that the disk resources for your specific pod are limited. The Kubernetes system can set limits on resources like CPU, memory, and disk storage.

So start with checking the Kubernetes resource limits and requests for your pod: run kubectl describe pod <my-pod> to check if there are resource limits or requests set. Look for something like:

Resources:
  Limits:
    ephemeral-storage: 1Gi
  Requests:
    ephemeral-storage: 500Mi

The ephemeral-storage represents the storage available for your pod to use. If it is set too low, you might need to adjust it.

Try also to set said resource requests and limits yourself: You can specify the resources available for your pod by adding the following in your pod or deployment configuration with:

resources:
  requests:
    ephemeral-storage: "1Gi"
  limits:
    ephemeral-storage: "2Gi"

That allows your pod to request 1 GiB of ephemeral storage and limit it to using 2 GiB. Adjust these values as needed based on the size of the images you are dealing with.


But another approach would be to consider using Persistent Volumes (PV): If your application needs to store a lot of data (like many large image files), consider using a Persistent Volume (PV) and Persistent Volume Claim (PVC). PVs represent physical storage in a cluster and can be used to provision durable storage resources. You would need to change your application's code or configuration to write to this PV.

Define a PV and PVC:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/data"

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

And in your pod spec, you would add:

volumes:
  - name: my-storage
    persistentVolumeClaim:
      claimName: my-pvc

And mount it into your pod:

volumeMounts:
  - mountPath: "/code/pca_back_data/media"
    name: my-storage

When you create the Persistent Volume (PV) and mount it into your pod at the same location (/code/pca_back_data/media), your application will continue to write to the same directory without needing to change the Django settings.

The only difference is that the storage will now be backed by a Persistent Volume which is designed to handle larger amounts of data and will not be subject to the same restrictions as the pod's ephemeral storage.

In that case, no changes would be required in your Django settings. The application would continue to write to the same path but the underlying storage mechanism would have changed.

However, do note that hostPath should be used only for development or testing. For production, consider using a networked storage like an NFS server, or a cloud provider's storage service.


I am already using PVC that is attached to this pod. And it has more than enough storage. What is stranger is that not all files are failing this… –

As I commented, it could be a concurrency issue: If multiple processes or threads are trying to write to the same file or location simultaneously, that might cause some operations to fail with "No space left on device" errors.
Also, Although the PVC has enough available space, individual filesystems on the PVC might have quotas that limit how much space they can use. Check if there are any such quotas set on your filesystem.

The OP confirms:

There is something like this happening - multiple processes are using the same PVC directory, maybe not exactly same file but same parent directory can be accessed by those processes.

Multiple processes using the same PVC directory or parent directory should generally not be a problem, as long as they are not trying to write to the same file at the same time. But if these processes are creating a large number of files or very large files, and if your PVC or underlying filesystem has a limit on the number of files (inodes) or the total size of files it can handle, that could potentially lead to the "No space left on device" error.

You can check for filesystem quotas on a PVC:

  • Connect to your pod: kubectl exec -it <your-pod> -- /bin/bash

  • Install the quota package: This can usually be done with apt-get install quota on Debian/Ubuntu systems or yum install quota on CentOS/RHEL systems. If these commands do not work, you may need to look up how to install quota for your specific container's operating system.

  • Check quotas: Run quota -v to view quota information. If quotas are enabled and you are nearing or at the limit, you will see that here.

If your filesystem does not support quotas or they are not enabled, you will not get useful output from quota -v. In that case, or if you are unable to install the quota package, you might need to check for quotas from outside the pod, which would depend on your Kubernetes setup and cloud provider.

If you are still having trouble, another possible culprit could be a Linux kernel parameter called fs.inotify.max_user_watches, which can limit the number of files the system can monitor for changes. If you are opening and not properly closing a large number of files, you could be hitting this limit. You can check its value with cat /proc/sys/fs/inotify/max_user_watches and increase it if necessary.


The OP adds:

I think the issue in my case is that /tmp folder inside the pod is running out of space (in Django /tmp is used for the files when saving to database if I understand correctly), not sure how to expand size of it?

Yes, you're correct. Django, like many other systems, uses the /tmp directory to handle temporary files, which includes processing file uploads. If the /tmp directory is running out of space, you can consider the following options:

  • Increase the Pod ephemeral storage limit: as mentioned above, you can adjust the ephemeral storage requests and limits in your pod or deployment configuration, like so:
resources:
  requests:
    ephemeral-storage: "2Gi"  # Request 2Gi of ephemeral storage
  limits:
    ephemeral-storage: "4Gi"  # Limit ephemeral storage usage to 4Gi

Remember to adjust these values according to your needs.

  • Or use an emptyDir Volume for /tmp: meaning use a Kubernetes emptyDir volume for your /tmp directory. When a Pod is assigned to a Node, Kubernetes will create an emptyDir volume for that Pod, and it will exist as long as that Pod is running on that node. The emptyDir volume can use the node's storage space, and you can specify a size limit.

Here is how you might define an emptyDir volume for /tmp:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: my-container
    image: my-image
    volumeMounts:
    - name: tmp-storage
      mountPath: /tmp
  volumes:
  - name: tmp-storage
    emptyDir:
      medium: "Memory"
      sizeLimit: "2Gi"  # Set a size limit for the volume

The medium: "Memory" means that the emptyDir volume is backed by memory (tmpfs) instead of disk storage. If you remove this line, the emptyDir volume will use disk storage. The sizeLimit is optional.

  • You can also consider using a dedicated PVC for /tmp: If the above options are not feasible or if you need more control over the storage for /tmp, you can also use a dedicated PVC for it, similar to the one you're using for /code/pca_back_data/media.

Remember that changes to your pod or deployment configuration need to be applied with kubectl apply -f <configuration-file>, and you may need to recreate your pod or deployment for the changes to take effect.


The OP concludes in the comments:

I managed to solve this issue: looks like the GCP storage disc was somehow corrupted and we changed to another and it seems to be fine now.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Thanks for through explanation- I am already using PVC that is attached to this pod. And it has more than enough storage. What is stranger is that not all files are failing this… – Alex T Jul 10 '23 at 07:01
  • 1
    @AlexT did you check for a concurrency issue? If multiple processes or threads are trying to write to the same file or location simultaneously, that might cause some operations to fail with "No space left on device" errors. Also, Although the PVC has enough available space, individual filesystems on the PVC might have quotas that limit how much space they can use. Check if there are any such quotas set on your filesystem. – VonC Jul 10 '23 at 07:24
  • There is something like this happening- multiple processesd are using the same PVC directory, maybe not exactly same file but same parent directory can be accessed by those processes. Would this also pose an issue? Do you know how can I check individual filesystem quotas on the PVC? – Alex T Jul 10 '23 at 08:53
  • 1
    @AlexT I have edited the answer to address your comment. – VonC Jul 10 '23 at 12:04
  • I think the issue in my case is that `/tmp` folder inside the pod is running out of space (in Django /tmp is used for the files when saving to database if I understand correctly), not sure how to expand size of it? – Alex T Jul 11 '23 at 12:40
  • 1
    @AlexT I have edited the answer to address your comment. – VonC Jul 11 '23 at 13:50
  • I did exactly as you suggested, updated with ephemeral storage and also created PVC that points to /tmp directory which works and I can see that there are files being saved their temporarily but they dont take more than few MBs. Still getting same error. Not sure if this is related to storage issue or there is something wrong with Django at this point. – Alex T Jul 12 '23 at 13:13
  • @AlexT OK. Consider checking other common locations for temporary files, like `/var/tmp`. Try uploading *smaller* files to see if the error still occurs. Try a different storage backend to see if the issue persists. ooking at [Django issues](https://code.djangoproject.com/query?status=assigned&status=closed&status=new&summary=~tmp&desc=1&order=id), try and use `/var/cache` instead of `/var/tmp`. – VonC Jul 12 '23 at 13:25
  • Ok so I managed to solve this issue, looks like the GCP storage disc was somehow corrupted and we changed to another and it seems to be fine now. – Alex T Jul 26 '23 at 13:41
  • @AlexT Great, well done! I have included your comment in the answer for more visiblility. – VonC Jul 26 '23 at 13:42
0

Please increase the resources of the pod it would help, because when application is in starting state its resource consumption spikes up so, also can you share, how much resource you have used for this

      resources:
        limits:
          cpu: 50m
          memory: 450Mi
        requests:
          cpu: 30m
          memory: 350Mi

edit this according to your requirement

Sujay_ks
  • 47
  • 7