2

I have two Google projects: dev and prod. I import data from also different storage buckets located in these projects: dev-bucket and prod-bucket.

After I have made and tested changes in the dev environment, how can I smoothly apply (deploy/copy) the changes to prod as well?

What I do now is I export the flow from devand then re-import it into prod. However, each time I need to manually do the following in the `prod flows:

  • Change the dataset that serve as inputs in the flow
  • Replace the manual and scheduled destinations for the right BigQuery dataset (dev-dataset-bigquery and prod-dataset-bigquery)

How can this be done more smoother?

WJA
  • 6,676
  • 16
  • 85
  • 152
  • Not sure if this is really possible, since Dataprep doesn't have exposed APIs which means what you can do via the UI can't be done via scripting. Also, IMHO, I wouldn't want to sync my dev to prod unless I have tested that it's working good in dev. – Christopher Aug 02 '19 at 07:24
  • Yes off course, that is my point. Having tested it in dev, how can I simply deploy the changes to the prod? Perhaps syncing is not the right word. – WJA Aug 02 '19 at 07:27
  • Maybe spinnaker can help you out: https://www.spinnaker.io/. Here at our company we also use CICD to keep QA environment (not dev) synced with prod. – Willian Fuks Aug 02 '19 at 13:12
  • Linking: https://stackoverflow.com/q/50620872/320399 – blong Aug 06 '19 at 17:52

2 Answers2

0

If you want to copy data between Google Cloud Storage (GCS) buckets dev-bucket and prod-bucket, Google provides a Storage Transfer Service with this functionality. https://cloud.google.com/storage-transfer/docs/create-manage-transfer-console You can either manually trigger data to be copied from one bucket to another or have it run on a schedule.

For the second part, it sounds like both dev-dataset-bigquery and prod-dataset-bigquery are loaded from files in GCS? If this is the case, the BigQuery Transfer Service may be of use. https://cloud.google.com/bigquery/docs/cloud-storage-transfer You can trigger a transfer job manually, or have it run on a schedule.

As others have said in the comments, if you need to verify data before initiating transfers from dev to prod, a CI system such as spinnaker may help. If the verification can be automated, a system such as Apache Airflow (running on Cloud Composer, if you want a hosted version) provides more flexibility than the transfer services.

Tim Swast
  • 14,091
  • 4
  • 38
  • 61
  • This has not so much to do with copying files between different projects. You can use gcloud utils for that as you point out in some of the links (not just transfers). Here we are talking about Flows in Dataprep. How did they intend to manage different projects (dev/staging/prod)? – WJA Aug 10 '19 at 10:28
0

Follow below procedure for movement from one environment to another using API and for updating the dataset and the output as per new environment.

1)Export a plan

GET

https://api.clouddataprep.com/v4/plans/<plan_id>/package

2)Import the plan

Post:

https://api.clouddataprep.com/v4/plans/package

3)Update the input dataset

PUT:

https://api.clouddataprep.com/v4/importedDatasets/<datset_id>


{
   
    "name": "<new_dataset_name>",

    "bucket": "<bucket_name>",

    "path": "<bucket_file_name>"
}

4)Update the output

PATCH

https://api.clouddataprep.com/v4/outputObjects/<output_id>


{
  
"publications": [

    {

      "path": [

          "<project_name>",

          "<dataset_name>"
      ],

      "tableName": "<table_name>",

      "targetType": "bigquery",

      "action": "create"

    }

  ]

}
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103