1

I am running samtools on a google VM with 8CPUs. It seems that when the process is finished, the program crashes giving the below error. At the same time, there is a problem with the bucket, showing this. Any ideas? Problems with saving the file?

Error:

username@instance-1:~/my_bucket$ /usr/local/bin/bin/samtools view -@20  -O sam -f 4 file_dedup.realign
ed.cram > file.unmapped.sam
samtools view: error closing standard output: -1

Also this comes up when tying ls in the bucket directory:

ls: cannot open directory '.': Transport endpoint is not connected
verdier
  • 54
  • 5
user2300940
  • 2,355
  • 1
  • 22
  • 35
  • Please provide more details about your use case. Are you using FUSE to mount your bucket to the VM instance? – Serhii Rohoza Mar 29 '21 at 08:14
  • bucket is mounted using: /usr/bin/gcsfuse username_bucket_1 my_bucket – user2300940 Mar 29 '21 at 08:15
  • Have you tried to save your data at the VM's disk instead of bucket? – Serhii Rohoza Mar 29 '21 at 08:18
  • Don't think I have enough space on the VM. The output file is about 6GB. How would I specify that the files is saved to the VM and not the bucket? I thinks its related to big files, because copying small files within the bucket is OK. – user2300940 Mar 29 '21 at 08:20
  • This issue could be related to [differences FUSE from a POSIX file system](https://cloud.google.com/storage/docs/gcs-fuse#notes). You can try to set path for file.unmapped.sam. – Serhii Rohoza Mar 29 '21 at 08:26
  • should I set the path to the bucket? – user2300940 Mar 29 '21 at 08:44
  • No, the idea is to check if it'll work with VM's disk instead of bucket. If there's no issue, so the problem is FUSE. – Serhii Rohoza Mar 29 '21 at 09:46
  • 1
    It worked when linking to the VM disk instead. Is there a reason why I should use a bucket at all? Can I Just increase the size of the VM disk and use that? Are costs the same and is there a max size limit? – user2300940 Mar 29 '21 at 09:59

1 Answers1

1

As we discovered at the comment section this issue is related to the difference between a FUSE and a POSIX file systems.

You can solve this issue in two ways:

  1. Increase disk space on your VM instance (by following the documentation Resize the disk and Resize the file system and partitions) and stop using Google Cloud Storage Bucket mounted via FUSE.
  2. Save data received from samtools to the VM's disk at first and then move them to the Google Cloud Storage Bucket mounted via FUSE.

You can estimate cost for each scenario with Google Cloud Pricing Calculator.

Keep in mind that persistent disks have restrictions, among them:

  • Each persistent disk can be up to 64 TB in size, so there is no need to manage arrays of disks to create large logical volumes.
  • Most instances can have up to 128 persistent disks and up to 257 TB of total persistent disk space attached. Total persistent disk space for an instance includes the size of the boot persistent disk.

In addition, please have a look Quotas & limits for Google Cloud Storage.

Serhii Rohoza
  • 4,287
  • 2
  • 16
  • 29