2

I have been running daily Dataprep jobs and since the update last week, approximately half of my jobs are now hanging and not being published. They appear as jobs in progress although when I go to the actual job page, the job appears to be complete. There is no publishing action and the publishing target does not appear updated. Some jobs have now been going on for over 72 hours since Friday.

I've seen traces of other users having the same issue online but have not seen any sort of response or recognition from either Google or Trifacta.

I have tried restarting the jobs to no success and it appears that there is no way to cancel those hanging jobs because from Google's perspective, it seems as though the jobs were successful itself, just not published. This problem appears both on my jobs that publish to BigQuery as well as jobs that publish to Google Cloud Storage, as well as manual and scheduled jobs.

Trung Pham
  • 21
  • 1

4 Answers4

1

This may impact only jobs that have been pushed during the upgrade and should be rather cosmetic in nature. Please note that you won't get charged.

Did the exact same job work before with no changes? If so, please contact support and provide them as reference the successful and now failing job ID so it can be investigated further.

Cheers, Sebastian

justbeez
  • 1,367
  • 7
  • 12
  • I can't be certain because of how billing is broken out, but it looks to me like we're still seeing Dataflow charges accrued for these jobs even though they don't complete. – justbeez Jun 26 '19 at 14:19
0

I have come acros the same problem! The output of the jobs is placed in a temp folder in cloudstorage with the output mostly consisting out of multiple files without headers....

B Delfos
  • 21
  • 2
0

It is also creating huge issues here. Instead of the normal output file, it places multiple parts of it in a temp folder without headers. The makes new scheduled jobs that rely on these outputs useless, because it does not load the new output.

If you manually merge the files in the temp folder and add headers (in case of csv) + place them in the correct folder, the output can be created manualy (for csv).

Also no response from Google yet.

Mark
  • 121
  • 1
  • 8
0

We're seeing the exact same thing down to the destinations and job types . . . it's almost like Dataprep is losing track of the underlying DataFlow job and not finishing on its completion (that's why you see the temp files—that's the output, then Dataprep handles the formatting of the output file separately).

Someone was kind enough to already post this on the issue tracker, so please go star it and add any additional details that may be helpful to the Dataprep team: https://issuetracker.google.com/issues/135865374

justbeez
  • 1,367
  • 7
  • 12