I am currently evaluating a proof of concept which uses Google bucket, a java microservice and Dataflow.
The communication flow is like so:
- User sends CSV file to third party service
- Service uploads CSV file to Google bucket with ID and filename
- A create event is triggered and sent as a HTTP request to Java microservice
- Java service triggers a Google Dataflow job
I am starting to think that the Java service is not necessary and I can directly call Dataflow after the CSV is uploaded to the bucket?
This is the service as you can see its just a basic controller that validates the request params from the "Create" trigger and then delegates to the Dataflow service
@PostMapping(value = "/dataflow", produces = {MediaType.APPLICATION_JSON_VALUE})
public ResponseEntity<Object> triggerDataFlowJob(@RequestBody Map<String, Object> body) {
Map<String, String> requestParams = getRequestParams(body);
log.atInfo().log("Body %s", requestParams);
String bucket = requestParams.get("bucket");
String fileName = requestParams.get("name");
if (Objects.isNull(bucket) || Objects.isNull(fileName)) {
AuditLogger.log(AuditCode.INVALID_CLOUD_STORAGE_REQUEST.getCode(), AuditCode.INVALID_CLOUD_STORAGE_REQUEST.getAuditText());
return ResponseEntity.accepted().build();
}
log.atInfo().log("Triggering a Dataflow job, using Cloud Storage bucket: %s --> and file %s", bucket, fileName);
try {
return DataflowTransport
.newDataflowClient(options)
.build()
.projects()
.locations()
.flexTemplates()
.launch(gcpProjectIdProvider.getProjectId(),
dataflowProperties.getRegion(),
launchFlexTemplateRequest)
.execute();
} catch (Exception ex) {
if (ex instanceof GoogleJsonResponseException && ((GoogleJsonResponseException) ex).getStatusCode() == 409) {
log.atInfo().log("Dataflow job already triggered using Cloud Storage bucket: %s --> and file %s", bucket, fileName);
} else {
log.atSevere().withCause(ex).log("Error while launching dataflow jobs");
AuditLogger.log(AuditCode.LAUNCH_DATAFLOW_JOB.getCode(), AuditCode.LAUNCH_DATAFLOW_JOB.getAuditText());
}
}
return ResponseEntity.accepted().build();
}
Is there a way to directly integrate Google bucket triggers with Dataflow?