Questions tagged [google-cloud-dataprep]

An intelligent cloud data service to visually explore, clean, and prepare data for analysis.

DataPrep (or more accurately Cloud Dataprep by Trifacta) is a visual data transformation tool built by Trifacta and offered as part of Google Cloud Platform.

It is capable of ingesting data from and writing data to several other Google services (BigQuery, Cloud Storage).

Data is transformed using recipes which are shown alongside a visual representation of the data. This allows the user to preview changes, profile columns and spot outliers and type mismatches.

When a DataPrep flow is run (either manually or scheduled), a DataFlow job is created to run the task. DataFlow is Google's managed Apache Beam service.

205 questions
1
vote
1 answer

How can I parameterize a BigQuery table in Dataprep?

I'm used to use Dataprep to recipe json and csv files from Cloud Storage, but today I tried to ingest a table from BigQuery and could not parametrize. Is it possible to do that? Here are some screenshots to illustrate my question: The prefix that I…
1
vote
1 answer

Dataprep - accents and special characters

How do I solve this problem with accents / special characters in the dataprep? I need this information to appear. Thank you very much for your attention.
Theorp
  • 151
  • 8
1
vote
0 answers

Dataprep recipe fails to load with "Cannot read property 'expandScriptLines' of undefined"

A recent update to dataprep sometime between August 7-10 has broken a number of our dataprep recipes. Broken recipes fail to load with the error "Cannot read property 'expandScriptLines' of undefined". The browser console shows the following…
ty.
  • 10,924
  • 9
  • 52
  • 71
1
vote
0 answers

Dataprep Bigquery running in different region

So I get the following error when running dataprep. java.io.IOException: Query job beam_job_9e016180fbb74637b35319c89b6ed6d7_clouddataprepleads6085795bynick-query-d23eb37a1bee4a788e7b16c1de1f92e6 failed, status: { "errorResult" : { "message" : "Not…
1
vote
0 answers

Dataprep - missing rows after processing

I have csv containing 1.5 milion rows. I prepared Dataprep job that parse data and store them to BQ (or CSV). But after processing I have nearly half of rows missing (around 700k). When I run this Dataprep job without any recipe steps I got the same…
y0j0
  • 3,369
  • 5
  • 31
  • 52
1
vote
1 answer

In Trifacta or Google Cloud Dataprep, i'm trying to flag rows with non alpha numeric (�). What formula do I use?

In Trifacta or Google Cloud Dataprep, i'm trying to flag rows with non alpha numeric (�). What formula do I use? tried this formula but doesn't work Replace Matches of `�` from EMPLOYEE_FIRST with NOT VALID
1
vote
1 answer

Conversion of DateTime to Timestamp -adding the timestamp

I want to convert the date from former to later format 2020-04-14T14:56:43 TO 2020-04-14 14:56:43 UTC Basically how to convert the DATETIME into TIMESTAMP IN Dataprep?
1
vote
1 answer

How to unpivot an unknown number of columns in Google Dataprep / Trifacta?

Trifacta / Google Dataprep allows one to unpivot data, using its Unpivot Transform operator, in which one specify which columns to unpivot at design stage. How could one say unpivot an unknown number of columns. Here is a data example: The…
Jan Krynauw
  • 1,042
  • 10
  • 21
1
vote
2 answers

Problem to run job with Google Cloud DataPrep - User does not have bigquery.jobs.create - PERMISSION_DENIED

i've a problem to run a job in google dataprep. I set up a connection through an external database on google sql. In big query I imported the database connection. In google data prep I selected the table to do some operations. I tried to create a…
1
vote
0 answers

Can a Dataprep Flow be owned by a non personal Google Account or Service Account?

Does a Flow need to be owned by a user with a Google Account? A flow broke because we disabled an employees account. We reenabled so its working again but would like to change the owner of the flow to a Service Account or a non-personal account. I'm…
1
vote
1 answer

Unable to import more than 1000 files from Google Cloud Storage to Cloud Data Prep

I have been trying to run a Cloud Data Prep flow which takes files from Google Cloud Storage. The files on Google Cloud Storage gets updated daily and there are more than 1000 files in the bucket right now. However, I am not able to fetch more than…
1
vote
1 answer

Using Google Cloud Storage with Google Data Prep

I am using Google Cloud Storage to store CSV files. These CSV files get updated daily with new data in them. I'm hoping to use Google Data Prep to automate the process of cleaning these files and then combining them. Before I start to build this…
kstat
  • 160
  • 4
1
vote
0 answers

Give dataprep product access to user via terraform

I'm having a terraform project and want to give user access to dataprep product. I don't see any resource present in Terraform documentation But it looks like its not documented/supported. Ideas?
1
vote
1 answer

Dataprep - Dataflow fails when output is BigQuery

As a part of the POC I was trying to setup some data quality checks through Dataprep. There is a BigQuery table as a source and it should run a job with output to another BigQuery. Unfortunately that job fails with error: java.lang.RuntimeException:…
1
vote
1 answer

How do I use cloud dataprep to convert my excel file to target formet regularly?

I'd like to convert my excel to proper format using Google Cloud Dataprep. How do I save my convert flow and use it as a template? For example, if there were two excel files named A and B and I create a flow to merge these two, next time there are…
Lawrence
  • 88
  • 1
  • 1
  • 5