1

I have some HAR files (Hadoop archive files) on my HDFS based storage, which have some archived data that is not frequently used.

Now we have a plan to move to CEPH based storage. So I have 2 questions:

  1. Can I somehow use my existing HAR files on CEPH?
  2. Does CEPH have some archive utility like HDFS has Hadoop Archive utility?

Thanks

pri
  • 1,521
  • 2
  • 13
  • 26

1 Answers1

0

It's been a while since I have used Hadoop but I can answer following questions:

  • Can I somehow use my existing HAR files on CEPH?

    Although I am sure there is no official support for HAR in Ceph, I think its still possible since Ceph file system can be used as a drop-in replacement for the Hadoop File System (HDFS).

  • Does CEPH have some archive utility like HDFS has Hadoop Archive utility?

    Since I use Ceph on daily basis, I have not come across any such archive utility in Ceph similar to HAR. As you know, HAR uses .tar extension. Therefore, what I have been doing is using compressed tarballs. For block devices I store the tarballs as Ceph RBD (rados block device) volumes. And if I am working with Objects, I archive the tarballs as RGW objects.

In order to help you further I am sharing some useful threads to dig deeper:

BZKN
  • 1,499
  • 2
  • 10
  • 25
  • thanks for your answer. How do you archive your files on CEPH then? What about small files? – pri Feb 24 '22 at 18:27
  • I use tarballs. Please check my updated answer. I have shared some useful threads. May be helpful for you. – BZKN Feb 25 '22 at 09:25
  • @pri: Has my answer helped or given some helpful pointers? Please let me know as it is encouraging for newcomers like me on SO to answer questions :) – BZKN Mar 23 '22 at 16:28