Questions tagged [google-cloud-dataprep]

An intelligent cloud data service to visually explore, clean, and prepare data for analysis.

DataPrep (or more accurately Cloud Dataprep by Trifacta) is a visual data transformation tool built by Trifacta and offered as part of Google Cloud Platform.

It is capable of ingesting data from and writing data to several other Google services (BigQuery, Cloud Storage).

Data is transformed using recipes which are shown alongside a visual representation of the data. This allows the user to preview changes, profile columns and spot outliers and type mismatches.

When a DataPrep flow is run (either manually or scheduled), a DataFlow job is created to run the task. DataFlow is Google's managed Apache Beam service.

205 questions
2
votes
1 answer

Using Dataprep to write to just a date partition in a date partitioned table

I'm using a BigQuery view to fetch yesterday's data from a BigQuery table and then trying to write into a date partitioned table using Dataprep. My first issue was that Dataprep would not correctly pick up DATE type columns, but converting them to…
2
votes
1 answer

Most efficient way to filter BigQuery rows by latest date

I am currently working on an ETL pipeline that uses BigQuery to store staging data, and then uses Dataprep to transform the data and store it in new BigQuery tables for production. We have been experiencing issues finding the most cost effective way…
2
votes
1 answer

google cloud data Prep - error when importing dataset from BigQuery

When I tring to import bigquery tables as dataset in my data Prep flow, I have the following error: Could not create dataset: I/o error. I tried to import many bigquery tables (which are from same BQ dataset) all of them successfully imported…
2
votes
1 answer

Executing a Dataprep template with Dataflow API holds the timestamp included in the flow recipe

I have a cloud function which uses the dataflow API to create a new job from a template I created using DataPrep. The recipe basically cleans up some JSON objects, turn them into CSV format, and add a timestamp column to fetch everything in a…
2
votes
1 answer

Google Dataprep Import/Export flows

Does the Import/Export Flow option only work for the same project the original flow comes from? Having exported a flow from the flows page, I can't seem to import it into another account Thanks
Aaron Harris
  • 415
  • 1
  • 5
  • 15
2
votes
1 answer

google Dataprep: number of instances and architecture optimisation

I have noticed that every destination in Google dataprep (be it manual or scheduled) spins up a compute engine instance. Limit quota for a normal account is 8 instances max. look at this flow: dataprep flow Since datawrangling is composed by…
2
votes
1 answer

Google Cloud Dataprep: Transformation engine unavailable due to prior crash (exit code: -1)

I am trying to create a flow using Google Cloud Dataprep. The flow takes a data set from Big Query which contains app events data from Firebase Analytics to flatten event parameters for easier analysis. I keep getting the following error before even…
2
votes
1 answer

Dataprep: job finish event

We are considering using Dataprep on an automatic schedule in order to wrangle & load a folder of GCS .gz files into Big Query. The challenge is: how can the source .gz files be moved to cold storage once they are processed ? I can't find an event…
jldupont
  • 93,734
  • 56
  • 203
  • 318
2
votes
2 answers

Google Cloud DataPrep fails with cross-region error when using EU BigQuery db

I hit some issues today developing some new flows - the first I've done reading from & loading into EU-region BigQuery databases. To isolate the issue, I took the following steps: Create a new BQ database in the EU region Create a table by…
2
votes
2 answers

Google Cloud Dataprep - Functions

Are there any functions like discretization, normalization and data transformation (categorical to numeric) on Google Cloud Dataprep?
Gozde
  • 21
  • 3
2
votes
2 answers

How do I give access to Google Cloud Dataprep?

I have created a flow in Cloud Dataprep, job executed. All fine. However, my colleagues, who also has owner role in this GCP project, are not able to see the flow I created. I'm not able to find sharing options anywhere. How should it be setup so…
paulboony
  • 188
  • 2
  • 4
2
votes
2 answers

Dataflow Workers unable to connect to Dataflow Service

I am using Google Dataprep to start Dataflow jobs and am facing some difficulties. For background, we used Dataprep for some weeks and it worked without problem before we started to have authorization issues with the service account. When we finally…
2
votes
1 answer

BigQuery / DataPrep: Efficient way to extract word counts; to convert HTML to plaintext

I have a table of ~4.7M documents stored in BigQuery. Some are plaintext, some HTML. They're around 2k tokens per, with wide variation. I'm mainly using DataPrep to do my processing. I want to extract those tokens and calculate TF-IDF…
Sai
  • 6,919
  • 6
  • 42
  • 54
1
vote
2 answers

I am facing an issue in Google Cloud's Dataprep (Perform Foundational Data, ML, and AI Tasks in Google Cloud: Challenge Lab)

I am doing the challenge lab titled "Perform Foundational Data, ML, and AI Tasks in Google Cloud: Challenge Lab". However, in task 3 ("Task 3: Run a simple Dataprep job"), I am encountering an issue that is preventing me from completing the…
1
vote
0 answers

Dataprep - Schema does not match the recipe on every scheduled run

I am trying to create an ETL process, I have the desired data stored in Big Query. Every time I want to run my process in Dataprep this error pops up: The schema of the BigQuery table does not match the recipe (...) To solve it I have to manually…
1 2
3
13 14