0

After the end of Apache Beam (Google Cloud Dataflow 2.0) job, we get a readymade command at the end of logs bq show -j --format=prettyjson --project_id=<My_Project_Id> 00005d2469488547749b5129ce3_0ca7fde2f9d59ad7182953e94de8aa83_00001-0 which can be run from the Google Cloud SDK command prompt.

Basically it shows all the information like Job start time, end time, number of bad records, number of records inserted,etc.

I can see these information on the Cloud SDK console but where these information is stored? I checked in the stackdriver logs, it has the data till previous day and even not the complete information which is shown on the Cloud SDK console.

If I want to export these information and load into the BigQuery, where can I get it.

Update : This is possible and I found the information when I added filter resource.type="bigquery_resource" in the Stackdriver logs viewer but It shows Timestamp information like CreateTime, StartTime and EndTime as 1970-01-01T00:00:00Z

Shawn
  • 537
  • 3
  • 7
  • 16
  • Can you clarify your question? Where does this command come from? Who publishes it to the logs? Are you talking about Stackdriver logs, or about regular debug logs emitted by your main program, or about Dataflow logs visible on the dataflow job page? Which information do you want to export to bigquery? – jkff Jul 14 '17 at 18:23
  • Where does this command come from? --> I run the apache beam job through Eclipse and mentioned command come at the Log Console of Eclipse. – Shawn Jul 17 '17 at 08:57
  • OK, I see which log message you mean. However, I still don't understand the rest of your question. Which information are you referring to as "these information"? – jkff Jul 17 '17 at 23:13
  • I see which log message you mean --> I want to export these logs to BigQuery. Though I can export it from Stackdriver but Timestamp like Create Time, Start time and end time shows "1970-01-01T00:00:00Z" and not the actual date and time. – Shawn Jul 18 '17 at 11:44
  • Sorry, I still don't understand what you're asking. The pipeline logs a message containing a command of the form "bq show -j ..." and you want to have a BigQuery table containing these commands in string form? Or do you want a BigQuery table containing the results of these commands - i.e. a BigQuery table containing statistics of all BigQuery load jobs launched by all Dataflow pipelines in your project? Or of all BigQuery load jobs in your project, period? (unfortunately I think neither of these are possible, but I want to understand your use case anyway) – jkff Jul 18 '17 at 19:34
  • _do you want a BigQuery table containing the results of these commands - i.e. a BigQuery table containing statistics of all BigQuery load jobs launched by all Dataflow pipelines in your project?_ --> Yes! I am looking for this. – Shawn Jul 19 '17 at 06:06
  • I see. I don't think this is possible currently out of the box, but I added the bigquery tag so maybe someone else can help. – jkff Jul 19 '17 at 06:53

1 Answers1

0

You can export these logs into google cloud bucket. From stackdriver click on create export and then create sink providing sink name and sink destination which is bucket path obviously. Now next time when job get started then all the logs get exported and you can use those logs further.

Manoj Kumar
  • 380
  • 5
  • 20
  • This is partially correct but see update for the complete requirement. Sorry for the edit. – Shawn Jul 19 '17 at 07:12