1

I am implementing Cloud Functions to trigger DataPrep Dataflow job. I can do with a fixed table, and that works fine. When I try to give the table name inside the cloud function that changes over the time, I am getting the same result when the dataflow job runs at the initial point.

Following is the code:

const dataflow = google.dataflow({ version: 'v1b3', auth: authClient });

//obtain table name
var currentDate = datetime.create( new Date());
var previousDate =  new Date();
previousDate.setDate(previousDate.getDate()-1);
var previousDateTime = datetime.create(previousDate);
var currentHour = currentDate.format('H');
var tableName = 'Table_Name_' + currentDate.format('Ymd');
if (currentHour == "00"){
    tableName = 'Table_Name_' + previousDateTime.format('Ymd');
}
var finalTableName='db:dataset.'+tableName;

 dataflow.projects.templates.launch({
   projectId: project-name,
   gcsPath: gcspath,
   resource: {
     parameters: {
       inputLocations: '{"location1":"'+finalTableName+'","location2":table-2}',
       outputLocations: '{"location1":"gcspath/table.csv/file","location2":output-table,"location3":"gs://dataprep-staging-a7c655be-0fa4-4e9d-8f0f-9d126b4a381d/webmaster@gizmeon.com/jobrun/recipe___4_820401/.profiler/profilerTypeCheckHistograms.json/file","location4":"gs://dataprep-staging-a7c655be-0fa4-4e9d-8f0f-9d126b4a381d/webmaster@gizmeon.com/jobrun/recipe___4_820401/.profiler/profilerValidValueHistograms.json/file"}',
       customGcsTempLocation: 'custom-gcs-path'
     },
    environment: {
      tempLocation: temp-location,
      zone: "us-central1-f"
     },
     jobName: 'bucket-hourly-schedule-' + new Date().toISOString().replace(/T/, '-').replace(/\..+/, '').replace(/:/g, '-')
   }
 }, function(err, response) {
   if (err) {
     console.error("problem running dataflow template, error was: ", err);
   }
   console.log("Dataflow template response: ", response);
   callback();
 });

});
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
hamedazhar
  • 990
  • 10
  • 26
  • I don't quite understand what the problem is. _"When I try to give the table name inside the cloud function that changes over the time, I am getting the same result when the dataflow job runs at the initial point."_ - can you explain this is more detail? – Graham Polley Oct 03 '18 at 11:56
  • @GrahamPolley You can see from the code that my table is not fixed, means daily a table is created with date appended to the common name. My aim is to schedule this function so that the dataprep job runs everytime with the corresponsding table as the input. – hamedazhar Oct 03 '18 at 12:08
  • I still don't get exactly what the problem is that you're trying to explain. – Graham Polley Oct 03 '18 at 12:17
  • @GrahamPolley my issue is my initial job gets run all the time, means the dataprep does not identifies the latest data input, always the initial input is taken from the input table – hamedazhar Oct 03 '18 at 12:28

0 Answers0