Questions tagged [google-cloud-dataprep]

An intelligent cloud data service to visually explore, clean, and prepare data for analysis.

DataPrep (or more accurately Cloud Dataprep by Trifacta) is a visual data transformation tool built by Trifacta and offered as part of Google Cloud Platform.

It is capable of ingesting data from and writing data to several other Google services (BigQuery, Cloud Storage).

Data is transformed using recipes which are shown alongside a visual representation of the data. This allows the user to preview changes, profile columns and spot outliers and type mismatches.

When a DataPrep flow is run (either manually or scheduled), a DataFlow job is created to run the task. DataFlow is Google's managed Apache Beam service.

205 questions
0
votes
2 answers

Schedule deleting a BQ table

I am streaming data into BQ, every day I run a scheduled job in Dataprep that takes 24 hours of data and modifies some data and creates a new table in the BQ dataset with 24 hours of data. The original table though stays unmodified and keeps on…
0
votes
2 answers

How can I rename several columns in dataprep?

I have more than 100 columns in dataprep whose names are like: my column name 1 my column name 2 I would like to rename the name of the columns to be: my_column_name_1 my_column_name_2 I have tried to do a rename, changing " " by "_". However,…
jonaetn
  • 9
  • 4
0
votes
1 answer

How to unnest Google Analytics custom dimension in Google Data Prep

Background story: We use Google Analytics to track user behaviour on our website. The data is exported daily into Big Query. Our implementation is quite complex and we use a lot of custom dimensions. Requirements: 1. The data needs to be imported…
0
votes
1 answer

Flows Disappeared from Project

I've been using Dataprep for months, and have a lot of different flows built in one of my projects. I was working with it this morning, but now when I log in, the project in Dataprep is blank, like I'm a brand new user. I'm starting to panic…
0
votes
1 answer

GCP Dataflow API doesn't evaluate now/today() function called in the recipe writed on dataprep

After launching the API dataflow.projects.locations.templates.launch(i have tested also "create" api), with a template generated before on dataprep, a column generated with today() function (i have tested also "now()" function) seems not be…
0
votes
1 answer

DataPrep - find first date

Using DataPrep, I am trying to identify 2 dates based on the Order_Date, (1) the first order date and (2) the latest order date. I have used the min and max functions to find the first and last order dates however, the result is an integer. When I…
0
votes
1 answer

Google Dataprep integration whith message brokers

Is it possible to read form Kafka or Google Pub/Sub in a Dataprep Job? If so, any 'best practice' deployment considerations I should expect when the samples are edited on board an "oh so snappy, live and responsive" a la visual studio (minus the…
0
votes
1 answer

Add more workers to dataflow job on GCP

Im creating a dataprep flow that imports a CSV to BQ. This works fine but it takes too long time. Even for very small files. Is there a way to add more workers on the job? maxNumWorkers is always 1 by default. Br Cris
0
votes
1 answer

Count by value distributed across multiple columns in google cloud dataprep

I have a somewhat complex data transformation task that I can not figure out in Google Cloud Data prep. The source data is voter file information. The CSV has 10 columns (among many others) that contain a voter's election participation history. See…
Jake Lowen
  • 899
  • 1
  • 11
  • 21
0
votes
1 answer

Apache Beam Python syntax

So, I'm just getting started with Apache Beam. I plan to run DataFlow jobs in GCP, I was originally running them with DataPrep but I quickly outgrew its functionality. Caveat, I have been programming in Python 2/3 for 2 years now, so I think I've…
DMan
  • 73
  • 1
  • 10
0
votes
1 answer

GCP Data Prep- forward and backward fill

I have the following table which I am trying to wrangle in GCP Data prep: Timestamp Event 2018-04-01 0 2018-04-02 0 2018-04-03 0 2018-04-04 0 2018-04-05 1 2018-04-06 0 2018-04-07 0 2018-04-08 0 I am trying to transform it in a way…
FlyingPickle
  • 1,047
  • 1
  • 9
  • 19
0
votes
1 answer

Google cloud dataprep - how to create hash of a column

can anyone point to a out of the box or custom implementation in Cloud Dataprep to create hash of another column, like bigquery has FARM_FINGERPRINT.
0
votes
2 answers

migrate cloud dataprep to a different account

I had to migrate my personal Account to a different email provider, new Account is Organization Admin of entire GC project. I can not see any dataprep flow from within my account but I can still access all flow from my old account, is there a quick…
0
votes
1 answer

Google Cloud Dataprep - How to replace a pattern in a string

How can I replace a pattern in a string in one column with a value from another column in Cloud Dataprep? To be precise, I have a column A with the same pattern in every string of the column, and I want to replace that pattern inside a string with…
zerina
  • 131
  • 1
  • 1
  • 4
0
votes
1 answer

Speed The Processing Time Of A JoB

I have a sample (100 row) and three steps in my Recipe; When i run the job to load the data in a table in bigquery; it takes 6mn to create the table. The timelapse is too long for a simple process like the one that i am testing. I am trying to…
BeeKay
  • 1
  • 2