0

In the hope of achieving Cloudera Backup and Disaster Recovery to AWS-like functionality in GCP, I am searching for some alternatives.

Will the below approach work?

  1. adding GCP connector to an on-prem Cloudera cluster
  2. then copying with hadoop dist-cp
  3. then syncing hdfs source directory to gcs directory with gsutil rsync [OPTION]... src_url dst_url

If the above approach is not possible then is there any other alternative to achieve Cloudera BDR in Google Cloud Storage (GCS)?

nomadSK25
  • 2,350
  • 3
  • 25
  • 36

1 Answers1

0

As of the moment, Cloudera Manager’s Backup and Disaster Recovery does not support Google Cloud Storage it is listed in limitations. Please check the whole documentation through this link for Configuring Google Cloud Storage Connectivity.

The above approach will work. We just need to add a few steps to begin with:

  1. We first need to establish a private link between on-prem network and Google network using Cloud Interconnect or Cloud VPN.
  2. Dataproc cluster is needed for data transfer.
  3. Use Google CLI to connect to your master's instance.
  4. Finally, you can run DistCp commands to move your data.

For more detailed information, you may check this full documentation on Using DistCp to copy your data to Cloud Storage.

Google also has its own BDR and you can check this Data Recovery planning guide.

Please be advised that Google Cloud Storage cannot be the default file system for the cluster.

You can also check this link: Working with Google Cloud partners

You could either use the following connectors:

  • In a Spark (or PySpark) or Hadoop application using the gs:// prefix.
  • The hadoop shell: hadoop fs -ls gs://bucket/dir/file.
  • The Cloud Console Cloud Storage browser.
  • Using the gsutil cp or gsutil rsync commands.

You can check this full documentation on using connectors.

Let me know if you have questions.

Robert G
  • 1,583
  • 3
  • 13