Doubts reqarding CEPH RGW

Question

Well I am new to CEPH and I have some question regarding it?

What is default value of rgw_stripe_size and max_chunk_size?

What is default size of object stored in Ceph Storage Cluster(I believe it is 4 MB )? Is it dependent on stripe size or chunk size?

What is bucket with respect to RGW? How one should decide the name of bucket? Does creating too many bucket (different bucket per request) creates performance problem?

Why does CEPH first stripes the data into series of stripes and then again divide these stripes into smaller chunks? Is striping data into just stripes not enough?

If an object is divided into series of smaller units (for performance benefit) , how does CEPH returns the complete object when GET request is made?

Where does it store the ids/numbers of subsequent stripes to form a complete object from its smaller chunks?

Does striping a small object (e.g. 100 KB to 4 MB) creates a performance overhead as CEPH has to read all the chunks related to this Object and then combine it into one single object before it returns it? Isn't it too much optimization for handling smaller objects?

Does librados (ceph native apis) also perform data striping if used for storing data into CEPH cluster?

I googled but I could not find any concrete resource which explains how RGW has implemented this?

score 1 · Answer 1 · answered Nov 29 '17 at 09:59

I got the answers for all above questions from "Yehuda Sadeh-Weinraub" a developer from RedHat. Thanks a lot Yehuda for the detailed response. I am just pasting his response as it is.

What is default value of rgw_stripe_size and max_chunk_size?

Ans: The default rgw stripe size for rgw is 4MB, default chunk size is currently also 4MB. It used to be 512k.

What is default size of object stored in Ceph Storage Cluster(I believe it is 4 MB )? Is it dependent on stripe size or chunk size?

Ans: Not sure I understand the question. The stripe size is the size of the objects stored in RADOS.

What is bucket with respect to RGW? How one should decide the name of bucket? Does creating too many bucket (different bucket per request) creates performance problem?

Ans: A bucket is a placeholder for objects. In rgw it means that we have 3 different entities: bucket instance info object that holds the bucket instance metadata (e.g., acls), bucket entry point that links between the bucket name and the bucket instance, and the bucket index that holds the list of all the objects. There is also an entry in the user's list of buckets for each bucket that it owns.

Why does CEPH first stripes the data into series of stripes and then again divide these stripes into smaller chunks? Is striping data into just stripes not enough?

Ans: The chunking is done to separate the read/write IO logic from the actual data representation. There are different requirements and ramifications that we can have at different levels of the stack. For example, the chunk size also determines the amount of memory you will need to keep for each IO operation.

If an object is divided into series of smaller units (for performance benefit) , how does CEPH returns the complete object when GET request is made?

Ans: The radsos gateway sends concurrent requests to fetch the object's data and reassembles them in memory. It has a sliding operation window.

Where does it store the ids/numbers of subsequent stripes to form a complete object from its smaller chunks?

Ans: The object's head has a manifest that describes the object's layout in rados. When reading an object, the head is being read first. The head might also contain data, so for small enough objects there isn't a need to read further. For larger objects, rgw will use the manifest to determine where to find the pieces.

Does striping a small object (e.g. 100 KB to 4 MB) creates a performance overhead as CEPH has to read all the chunks related to this Object and then combine it into one single object before it returns it? Isn't it too much optimization for handling smaller objects?

Ans: Objects that are created in current version are not going to be stripe if they are up to 4MB.

Does librados (ceph native apis) also perform data striping if used for storing data into CEPH cluster?

Ans: No, there's another library for that you can use (libradosstriper).

Doubts reqarding CEPH RGW

1 Answers1