0

We are using Lustre in a cluster with approximately 200TB of storage, 12 Object Storage Targets (that connect to a DDN storage system using QDR Infiniband), and roughly 160 quad and 8-core compute notes. Most of the users of this system have no problems at all, but my tasks are I/O intensive. When I run an array job that has 250-500 processes that are simultaneously pounding the file system typically between 10 and 20 of my processes will fail. The log files indicate that the load on the OSTs are going over 2 and that the Lustre client is returning either bad data or failed read() function calls.

Currently the only way we have of resolving my problem is to run fewer simultaneous jobs. This is unsatisfactory, because there is no way to know in advance if my workload will be CPU-heavy or I/O heavy. Besides, just turning down the load isn't the way to run a supercomptuer: we would like it to run slower when running under load, not produce incorrect answers.

I'd like to know how to configure Lustre so that clients block when the load on the OSTs goes too high, rather than having the clients get bad data.

How do I configure Lustre to make the clients block?

vy32
  • 2,088
  • 2
  • 17
  • 21

1 Answers1

1

Have you thought of adding more OSSs and spreading out the OSTs? That should decrease the load. In that vein, what kind of I/O pattern are you doing? Do you have many large files, if so, are they striped? Default striping is 1, which means each file resides on only 1 OST, that can be changed on a per file (at create) or on a per directory basis (for new files).

You could also try increasing the timeouts in lustre (lctl get_param/set_param) namely:

  • timeout
  • ldlm_timeout
utopiabound
  • 156
  • 4
  • I don't have the ability to add more OSSs, unfortunately. The I/O is that we have objects in the 20G to 100G range and each process reads the object from beginning to end. I don't think that striping would help. We really just need to make it block. – vy32 Sep 25 '13 at 01:57
  • If you're just doing sequential I/O your just going to be limited by your backend storage. Actually, if you do sequential I/O with a blocksize larger than the stripe size, you will see an improvement in performance, since multiple I/Os will be in-flight to the various OSTs. – utopiabound Oct 02 '13 at 13:39
  • Is that the user-specified I/O size or the system I/O size? – vy32 Oct 02 '13 at 15:12