1

I try to create a dataflow job to index a bigquery table into elasticSearchwith the node package google-cloud/dataflow.v1beta3.

The job is working fine when it's created and launched from the google cloud console, but I have the following error when I try it in node: Error: 3 INVALID_ARGUMENT: (b69ddc3a5ef1c40b): Cannot set worker pool zone. Please check whether the worker_region experiments flag is valid. Causes: (b69ddc3a5ef1cd76): An internal service error occurred.

I tried to specify the experiments params in various ways but I always end up with the same error.

Does anyone managed to get a similar dataflow job working? Or do you have information about dataflow experiments?

Here is the code:

const { JobsV1Beta3Client } = require('@google-cloud/dataflow').v1beta3

const dataflowClient = new JobsV1Beta3Client()
const response = await dataflowClient.createJob({
  projectId: 'myGoogleCloudProjectId',
  location: 'europe-west1',
  job: {
    launch_parameter: {
      jobName: 'indexation-job',
      containerSpecGcsPath: 'gs://dataflow-templates-europe-west1/latest/flex/BigQuery_to_Elasticsearch',
      parameters: {
        inputTableSpec: 'bigQuery-table-gs-adress',
        connectionUrl: 'elastic-endpoint-url',
        index: 'elastic-index',
        elasticsearchUsername: 'username',
        elasticsearchPassword: 'password'
      }
    },
    environment: {
      experiments: ['worker_region']
    }
  }
})

Thank you very much for your help.

PierreM
  • 21
  • 3
  • Can you try to follow the [example here](https://cloud.google.com/nodejs/docs/reference/dataflow/latest/dataflow/v1beta3.jobsv1beta3client#_google_cloud_dataflow_v1beta3_JobsV1Beta3Client_createJob_member_1_)? It's not too clear to me how this client works, but apparently it needs to set the `location` before to use the regional endpoints? I'll look more into it, just wanted to point out the resource above. – Bruno Volpato Sep 06 '22 at 03:02
  • Hi. Yes I followed this example before but there are not enough informations on the parameters format. Thank you. – PierreM Sep 06 '22 at 07:58

2 Answers2

1

After many attempts I manage yesterday to find how to specify the worker region. It looks like this:

await dataflowClient.createJob({
  projectId,
  location,
  job: {
    name: 'jobName',
    type: 'Batch',
    containerSpecGcsPath: 'gs://dataflow-templates-europe-west1/latest/flex/BigQuery_to_Elasticsearch',
    pipelineDescription: {
      inputTableSpec: 'bigquery-table',
      connectionUrl: 'elastic-url',
      index: 'elastic-index',
      elasticsearchUsername: 'username',
      elasticsearchPassword: 'password',
      project: projectId,
      appName: 'BigQueryToElasticsearch'
    },
    environment: {
      workerPools: [
        { region: 'europe-west1' }
      ]
    }
  }  
})

It's not working yet, I need to find the correct way to provide the other parameters, but now the dataflow job is created in the google cloud console.

PierreM
  • 21
  • 3
0

For anyone who would be struggling with this issue, I finally found how to launch a dataflow job from a template.

There is a function launchFlexTemplate that work the same way as the job creation in the google cloud console.

Here is the final function working correctly:

const { FlexTemplatesServiceClient } = require('@google-cloud/dataflow').v1beta3

const response = await dataflowClient.launchFlexTemplate({
  projectId: 'google-project-id',
  location: 'europe-west1',
  launchParameter: {
    jobName: 'job-name',
    containerSpecGcsPath: 'gs://dataflow-templates-europe-west1/latest/flex/BigQuery_to_Elasticsearch',
    parameters: {
      apiKey: 'elastic-api-key',  //mandatory but not used if you provide username and password
      connectionUrl: 'elasticsearch endpoint',
      index: 'elasticsearch index',
      elasticsearchUsername: 'username',
      elasticsearchPassword: 'password',
      inputTableSpec: 'bigquery source table',  //projectid:datasetId.table
      
      //parameters to upsert elasticsearch index
      propertyAsId: 'table index use for elastic _id',
      usePartialUpdate: true,
      bulkInsertMethod: 'INDEX'
    }
  }
PierreM
  • 21
  • 3