2

I'm using a BigQuery view to fetch yesterday's data from a BigQuery table and then trying to write into a date partitioned table using Dataprep.

My first issue was that Dataprep would not correctly pick up DATE type columns, but converting them to TIMESTAMP works (thanks Elliot).

However, when using Dataprep and setting an output BigQuery table you only have 3 options for: Append, Truncate or Drop existing table. If the table is date partitioned and you use Truncate it will remove all existing data, not just data in that partition.

Is there another way to do this that I should be using? My alternative is using Dataprep to overwrite a table and then using Cloud Composer to run some SQL pushing this data into a date partitioned table. Ideally, I'd want to do this just with Dataprep but that doesn't seem possible right now.

BigQuery table schema:

enter image description here

Partition details:

enter image description here

The data I'm ingesting is simple. In one flow:

+------------+--------+
|    date    |  name  |
+------------+--------+
| 2018-08-08 | Josh1  |
| 2018-08-08 | Josh2  |
+------------+--------+

In the other flow:

+------------+--------+
|    date    |  name  |
+------------+--------+
| 2018-08-09 | Josh1  |
| 2018-08-09 | Josh2  |
+------------|--------+

It overwrites the data in both cases.

Nimantha
  • 6,405
  • 6
  • 28
  • 69
Josh Laird
  • 6,974
  • 7
  • 38
  • 69
  • 1
    From [BigQuery Data Type Conversions](https://cloud.google.com/dataprep/docs/html/BigQuery-Data-Type-Conversions_102563896#publishwrite), you can use Dataprep's "Datetime (timestamp)" type to write to a `TIMESTAMP` in the BigQuery table, right? I don't think there is a way to use `DATE`, though. – Elliott Brossard Aug 23 '18 at 16:00
  • Thanks for the info @ElliottBrossard. I've managed to get it working using `TIMESTAMP` instead but another problem has arised. Edited the question. Is there a feature request thread somewhere? – Josh Laird Aug 24 '18 at 09:25
  • You could submit a feature request to the [Cloud Dataprep issue tanker](https://issuetracker.google.com/issues/new?component=190822&template=0). – Elliott Brossard Aug 24 '18 at 13:04
  • I have the same question! did you manage to do it in DataPrep? – MT467 Aug 27 '18 at 19:46
  • If you're writing to a single partition, can you use the partition as the destination table, e.g., "table$20180101"? Then Truncate will just remove that partition. – Hua Zhang Aug 27 '18 at 20:45
  • 1
    Hi HuaZhang: unable to select partition. The destination table textbox is uneditable, you have to select it from the GUI. Ideally, I would want to schedule this dataprep job so that it imports the latest date of data and not hardcode a partition. I believe this is likely the wrong use case for Dataprep. @MT467, I ended up just writing the query in SQL and scheduling with Composer – Josh Laird Aug 28 '18 at 08:23

1 Answers1

0

You ca create a partitioned table bases on DATE. Data written to a partitioned table is automatically delivered to the appropriate partition.

Data written to a partitioned table is automatically delivered to the appropriate partition based on the date value (expressed in UTC) in the partitioning column.

Append the data to have the new data added to the partitions.

You can create the table using the bq command:

bq mk --table --expiration [INTEGER1] --schema [SCHEMA] --time_partitioning_field date 

time_partitioning_field is what defines which field you will be using for the partitions.

Nathan Nasser
  • 1,008
  • 7
  • 18
  • How do you write to that partition with Dataprep? – Josh Laird May 15 '19 at 08:22
  • Data written to a partitioned table is automatically delivered to the appropriate partition – Nathan Nasser May 20 '19 at 20:25
  • you just need to send the data to the partitioned table without having to "specify" a partition. – Nathan Nasser May 20 '19 at 20:30
  • Is there any way to truncate that partition with Dataprep? As far as I'm aware, you can only truncate the entire table and not just that particular partition – Josh Laird May 21 '19 at 21:04
  • I understand better, you want to overwrite the data in a partition several times a day, correct me if I am wrong. If this is the case, you can load the data in a temporary table which you can use to WRITE_TRUNCATE using a partition decorator https://cloud.google.com/bigquery/docs/managing-partitioned-table-data#using_a_load_job – Nathan Nasser May 21 '19 at 21:39