Questions tagged [google-cloud-dataprep]

An intelligent cloud data service to visually explore, clean, and prepare data for analysis.

DataPrep (or more accurately Cloud Dataprep by Trifacta) is a visual data transformation tool built by Trifacta and offered as part of Google Cloud Platform.

It is capable of ingesting data from and writing data to several other Google services (BigQuery, Cloud Storage).

Data is transformed using recipes which are shown alongside a visual representation of the data. This allows the user to preview changes, profile columns and spot outliers and type mismatches.

When a DataPrep flow is run (either manually or scheduled), a DataFlow job is created to run the task. DataFlow is Google's managed Apache Beam service.

205 questions
0
votes
1 answer

GCP DataPrep- moving window

I have a CSV file of the following format that I am trying to wrangle with GCP dataprep. Timestamp Tag Value 2018-05-01 09:00:00 Temperature 40.1 2018-05-01 09:00:00 Humidity 80 2018-05-01…
FlyingPickle
  • 1,047
  • 1
  • 9
  • 19
0
votes
1 answer

regex duplicate words

I need to match (NOT DELETE) all duplicates words in a text. For example: Men's·Tee·Shirt·Vintage·T·Shirt·1990·Deep·Black·Red·Text·Deep·Black·Red·Text·X-Small Deep·Black·Red·Text·Deep·Black·Red·Text are repeating. None of the regex i could find…
zerina
  • 131
  • 1
  • 1
  • 4
0
votes
1 answer

Dataprep Job Failed

In dataprep jobs I have a transform Failed with the only information being: Job Failed : java.lang.NullPointerException: jobId. It does not even go to dataflow jobs, I have no logs or anything to go with. Any ideas why, or how to have more info to…
0
votes
0 answers

Google Cloud Dataprep Union

I have two datasets in my dataprep flow that i am trying to union. I am getting an error message that one of the dataset is broken and to fix before attempting a Union.Could someone advise what would be the best way to union two datasets? Any…
BeeKay
  • 1
  • 2
0
votes
2 answers

python api to launch template unknown name cannot find field

I've created and run a DataPrep job, and am trying to use the template from python on app engine. I can successfully start a job using gcloud dataflow jobs run --parameters…
0
votes
1 answer

Dataprep sort in reverse order does not work, any solutions?

I've come across a very annoying bug in Google Dataprep. According to this page: https://cloud.google.com/dataprep/docs/html/Window-Transform_57344658, it should be possible to reverse the order of sorting by adding a dash in front of the column…
B Delfos
  • 21
  • 2
0
votes
1 answer

Prepping a sparse dataset (empty row every other row) in google dataprep, results in an empty output

Here is another one of my bug findings in google dataprep: When using a sparse dataset as input (one empty row every other row) google dataprep is not able to process any recipes on it. The transformer page shows all the data in the intitial sample…
B Delfos
  • 21
  • 2
0
votes
1 answer

Reuse recipe in google dataprep

I am trying to use existing recipe from one dataset to another. Unfortunately, i am unable to locate the steps by steps process in the google cloud documentation. Could someone assist with the steps? Thank you!
BeeKay
  • 1
  • 2
0
votes
1 answer

How to insert values in google cloud dataprep

I have mismatched ZIP codes in Dataprep. I need to add two digits to columns where it was entered improperly. In Dataprep I get a suggestion to replace: '{start}{digit}{3}{end}' with ' ' In the replace dialog I can only put a string, not…
user40551
  • 355
  • 4
  • 12
0
votes
1 answer

Dataprep pivot transform

I'm new to Dataprep and now trying to create a pivot table using the "Pivot Transform" https://cloud.google.com/dataprep/docs/html/Pivot-Transform_57344645#example---basic-pivot I searched the documents and the syntax looks simple enough, except…
Amit Sadeh
  • 41
  • 4
0
votes
2 answers

Preprocessing on data stored in BigQuery

I've just started to use GCP and I have some doubts regarding the right use of some of its tools. Particularly, I'm trying to ingest data from Google Analytics into BigQuery. Would it be possible to use Dataprep on data stored in BigQuery? Almost…
0
votes
1 answer

Google Cloud DataPrep DATEDIF function inconsistent

I have four DateTime columns, all in long format eg 2016-08-01T21:13:02Z. They are called EnqDateTime, QuoteCreatedDateTime, BookingCreatedDateTime and RejAt. I want to add columns for the duration (in days) between EnquiryDateTime and the other…
Adam Hopkinson
  • 28,281
  • 7
  • 65
  • 99
0
votes
1 answer

Is it possible to sequentially chain Google DataPrep flows?

I have quite a long set of transforms, which I'd like to break into modules (each in it's own flow). I can't see a way of chaining these, other than scheduling consecutive timeslots. Has anyone managed this, or do I need to build one massive flow?
Adam Hopkinson
  • 28,281
  • 7
  • 65
  • 99
0
votes
2 answers

Big data in datalab

I'm trying to load my csv file into datalab. But the csv file is too large to load. Even if I managed to do that, it'll take too long to do the preprocessing. I'm thinking of using Keras to do ML on this dataset. The questions are: How do I use a…
Elona Mishmika
  • 480
  • 2
  • 5
  • 21
0
votes
2 answers

Google Dataprep Date Serial Number

In Google Dataprep when I apply min() to a date it gives a long serial number e.g. 1304985600000. I'm trying to get the first order date of a customer but I can't seem to do anything with this number Thanks
Aaron Harris
  • 415
  • 1
  • 5
  • 15
1 2 3
13
14