I got the answers for all above questions from "Yehuda Sadeh-Weinraub" a developer from RedHat. Thanks a lot Yehuda for the detailed response. I am just pasting his response as it is.
What is default value of rgw_stripe_size and max_chunk_size?
Ans: The default rgw stripe size for rgw is 4MB, default chunk size is
currently also 4MB. It used to be 512k.
What is default size of object stored in Ceph Storage Cluster(I believe it is 4 MB )? Is it dependent on stripe size or chunk size?
Ans: Not sure I understand the question. The stripe size is the size of the
objects stored in RADOS.
What is bucket with respect to RGW? How one should decide the name of bucket?
Does creating too many bucket (different bucket per request) creates performance problem?
Ans: A bucket is a placeholder for objects. In rgw it means that we have 3
different entities: bucket instance info object that holds the bucket
instance metadata (e.g., acls), bucket entry point that links between
the bucket name and the bucket instance, and the bucket index that
holds the list of all the objects. There is also an entry in the
user's list of buckets for each bucket that it owns.
Why does CEPH first stripes the data into series of stripes and then again divide these stripes into smaller chunks?
Is striping data into just stripes not enough?
Ans: The chunking is done to separate the read/write IO logic from the
actual data representation. There are different requirements and
ramifications that we can have at different levels of the stack. For
example, the chunk size also determines the amount of memory you will
need to keep for each IO operation.
If an object is divided into series of smaller units (for performance benefit) , how does CEPH returns the complete object when GET request is
made?
Ans: The radsos gateway sends concurrent requests to fetch the object's data and reassembles them in memory. It has a sliding operation window.
Where does it store the ids/numbers of subsequent stripes to form a complete object from its smaller chunks?
Ans: The object's head has a manifest that describes the object's layout in rados. When reading an object, the head is being read first.
The head might also contain data, so for small enough objects there isn't a need to read further. For larger objects, rgw will use the manifest to
determine where to find the pieces.
Does striping a small object (e.g. 100 KB to 4 MB) creates a performance overhead as CEPH has to read all the chunks related to this Object and then
combine it into one single object before it returns it? Isn't it too much optimization for handling smaller objects?
Ans: Objects that are created in current version are not going to be stripe if they are up to 4MB.
Does librados (ceph native apis) also perform data striping if used for storing data into CEPH cluster?
Ans: No, there's another library for that you can use (libradosstriper).