Questions tagged [lustre]

A highly scalable distributed file system.

Lustre is a highly scalable distributed file system. http://lustre.org/

44 questions
1
vote
2 answers

Error happens when trying to umount the Lustre file system

When I umount Lustre FS it displays: [root@cn17663-ens4 mnt]# umount /mnt/lustre umount: /mnt/lustre: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) and if I add the…
Li Hongbo
  • 11
  • 5
1
vote
1 answer

MongoDB "Unable to establish lock" error on mounted drive using Docker and Lustre

I am trying to use Docker to containerize a MongoDB instance using a drive mounted on host. Using the mongo:latest image: [user@dcos-master ~]$ docker run -d --name mongo -v /local/cluster/drive:/data/db mongo:latest But constantly fails…
1
vote
1 answer

CentOS 7 - boot order needs to be changed in order for sge to start automatically

It seems like sge tries start before lustre is mounted when the server boots, which brings an error to start automatically when it reboots. Can somebody tell me how to change the order when it boots, so sge starts after lustre is mounted? Error…
Jason
  • 25
  • 1
  • 9
1
vote
2 answers

What happens if my stripe count is set to more than my number of stripes

I have a doubt regarding Lustre file system. If I have a file of size 64 GB and I set stripe size to 1GB, my number of stripes become 64. But if I set my stripe count as 128, what does the Lustre do in that case?
user1439690
  • 659
  • 1
  • 11
  • 26
1
vote
1 answer

openmpi: MPI_recv hangs for specific number of processes

I am running a HPC benchmark (IOR - http://sourceforge.net/projects/ior-sio/) on lustre. I compiled the source of IOR and running it with openmpi 1.5.3. The problem is that it hangs when the number of processes (-np) is less than 6, which is odd.…
P.P
  • 117,907
  • 20
  • 175
  • 238
1
vote
2 answers

Lustre: Sending different write requests to different OSTs

I have a typical scenario where there can write requests in parallel, and each file is a few hundred GBs in size. My test system, a Lustre file system has 4 OSTs (3TB each) and 1 MDS. What I practically observed is that with striping disabled,…
hrs
  • 487
  • 5
  • 18
0
votes
1 answer

How to mount FSx Lustre in read only mode?

I have S3 data I'd like to mount via AWS FSx Lustre, and this data should only be read. Which additional flags, or changes, do I need beyond the default mount command sudo mount -t lustre -o noatime,flock DNS_NAME@tcp:/MOUNT_POINT /fsx
Sash
  • 4,448
  • 1
  • 17
  • 31
0
votes
1 answer

How to Train SageMaker job with data coming from FSx for Lustre

I am trying to implement the following example: https://medium.com/@sayons/transfer-learning-with-amazon-sagemaker-and-fsx-for-lustre-378fa8977cc1 but I am getting the following error: UnexpectedStatusException: Error for Training job…
0
votes
0 answers

How do I transfer files between FSx for Lustre using AWS DataSync in two different VPC's?

I have tried asking this on reddit and also on another AWS forum and have had no luck getting an answer. I'm hoping I can find an answer here. I have an AWS account that was using the default VPC. We are moving away from it and have a new one…
0
votes
1 answer

How to install the lustre client on Ubuntu nodes?

I am trying to install the lustre clients on Unbuntu 20.04 nodes I have in GCP. Im using linux kernel version 5.15.0-1021-gcp. I'm trying to install the client with the following code: cd /home/apps/ mkdir lustre git clone…
0
votes
1 answer

cannot run a host binary inside alpine ubuntu container

Team: need assistance in getting some hints what could be reason. am trying to run a binary lctl on an alpine container and unable to run the binary that I mounted from a host running ubuntu. the same binary runs fine on host. not sure why i can't…
AhmFM
  • 1,552
  • 3
  • 23
  • 53
0
votes
1 answer

How to debug a hanging job resulting from reading from lustre?

I have a job in interruptible sleep state (S), hanging for a few hours. can't use gdb (gdb will hang when attaching to the PID). can't use strace, strace will resume the hanging job =( WCHAN field shows the PID is waiting for ptlrpc. After some…
llodds
  • 153
  • 1
  • 2
  • 11
0
votes
1 answer

Can we propagate Amazon S3 IAM policies to FSx for lustre file system?

I want to create a FSx for lustre file system backed by an Amazon S3 bucket and want to mount that on EC2. Now if I have created some IAM policies on Amazon S3 that who can do what with Amazon S3 buckets content. For example, not allowing write…
0
votes
2 answers

collectl says "system does not have lustre modules installed"

I want to use collectl (V4.1.0-1) to get lustre (version=2.12.2_178_ga0680fe_dirty) specific stats. But, it says "-sl disabled because this system does not have lustre modules installed"! But, system does have the necessary lustre modules. Can…
user3488903
  • 131
  • 4
0
votes
1 answer

SQLite "disk I/O error" with multiple readers on Lustre filesystem

I'm aware that SQLite is not ideal on a shared filesystem with multiple clients. However, the documentation implies that multiple readers should be fine. My SQLite database resides on a Lustre volume and the database is "partitioned" -- albeit as a…
Xophmeister
  • 8,884
  • 4
  • 44
  • 87