Questions tagged [google-cloud-dataprep]

An intelligent cloud data service to visually explore, clean, and prepare data for analysis.

DataPrep (or more accurately Cloud Dataprep by Trifacta) is a visual data transformation tool built by Trifacta and offered as part of Google Cloud Platform.

It is capable of ingesting data from and writing data to several other Google services (BigQuery, Cloud Storage).

Data is transformed using recipes which are shown alongside a visual representation of the data. This allows the user to preview changes, profile columns and spot outliers and type mismatches.

When a DataPrep flow is run (either manually or scheduled), a DataFlow job is created to run the task. DataFlow is Google's managed Apache Beam service.

205 questions
0
votes
0 answers

Cloud Dataprep string type does not match Bigquery string type

When I attempt to rerun a recipe with a different data set (CSV) it gives me a column mismatch error even though both columns are strings. I tried adding a step in the recipe to explicitly make both columns stings, but I am still getting this error.…
0
votes
1 answer

How to manually control data schema interpretation

When I export public weather data from https://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/2017/CRNS0101-05-2017-TX_Austin_33_NW.txt, as soon as solar radiation > 9, all of my data for the remaining columns gets lumped into a single…
atltbone
  • 285
  • 2
  • 4
  • 11
0
votes
1 answer

Header Row Insert

I am trying to insert data to big query using google cloud dataprep, I did create recipe and add first row as header row, but when I am trying to run on multiple files it insert the header row to my big query table also. Anybody facing this problem…
Andy Lai
  • 1
  • 1
0
votes
1 answer

Using Dataset with Parameters for BigQuery in Cloud Data Prep?

I have several BigQuery datasets with daily-created tables, such as apples_201904010 apples_201904009 etc. I'd like to set up a schedule Cloud Data Prep job to process these tables each night, so using the dataset with parameters option is really…
Casey Grimes
  • 117
  • 12
0
votes
1 answer

Mismatched numeric values from a csv file in Dataprep

I am struggling to understand why Dataprep is assigning mismatched values to numerical values that I am trying to import from a .csv file. In my excel, all looks normal: but in dataprep, this is the value I am getting: It seems for most numbers…
WJA
  • 6,676
  • 16
  • 85
  • 152
0
votes
1 answer

Disappearance of dataset rows when is built a recipe

I upload the dataset into the storage of google cloud ai. Next, I open the flow in dataprep and put there the dataset. When I made the first recipe (without any step already) the dataset has approximately half of its original rows, that is, 36 234…
Ana
  • 31
  • 1
0
votes
1 answer

Dataprep Dashboard slow

I am having problems with performing actions on the Dataprep dashboard. In particular, when I try to merge two datasets, its just loading there for >20 minutes without a result. I tried to also add a new receipe or dataset and I only get that it…
WJA
  • 6,676
  • 16
  • 85
  • 152
0
votes
1 answer

Convert Mo-Sa and Mo-Fr to: Mo, Tue, Wed, ... -columns

I want to transform following Column: Col_openingHours: 1: Mo-Sa: 03:00 - 11:00 | 2: Mo-Sa: 02:00 - 10:00 into: Col_monday: 1: 03:00 - 11:00 | 2: 02:00 - 10:00 Col_tuesday: 1: 03:00 - 11:00 | 2: 02:00 - 10:00 ... How i can get this in…
0
votes
0 answers

Google cloud dataflow limit of steps/transformations

I am working with a complex Google Cloud Dataprep process. In order to make it work properly, I designed different modules and then linked them (with referenced datasets). If I execute those modules separately they work properly, but when executing…
0
votes
0 answers

In a scheduled Dataprep Job is it possible to export an output csv in Data Storage with suffix-based name?

I would like to schedule a daily Google DataPrep job exporting csv files into a Storage Bucket. These csv files should contain instead of the incremental number proposed by the console (ex. output.csv, output_1.csv, output_2.csv..) a date-based…
0
votes
1 answer

Exported Dataflow Template Parameters Unknown

I've exported a Cloud Dataflow template from Dataprep as outlined here: https://cloud.google.com/dataprep/docs/html/Export-Basics_57344556 In Dataprep, the flow pulls in text files via wildcard from Google Cloud Storage, transforms the data, and…
scb
  • 1
  • 3
0
votes
1 answer

Google Data Prep - cannot import table from BigQuery (created from Google Sheets) "Nothing found"

I created one table in BigQuery from Google Sheets, when I tried importing it in Cloud Data Prep it says that there are no tables in the dataset. I'm not sure whether it's an issue with the Google sheet integration, because when I check the details…
0
votes
2 answers

The schema of the BigQuery table does not match the recipe

I'm currently working on a BI stack that flows from bigquery to Tableau. I'm trying to use Dataprep to remove unecessary columns and join the tables in bigquery to create a "master" table to then feed into Tableau. the tables in bigquery update…
0
votes
1 answer

Can't share dataprep flow

I'm trying to share a Cloud Dataprep flow with some additional users in my business but it won't allow me to share or collaborate with any users in our cloud account: https://i.stack.imgur.com/rf3vY.jpg I thought i'd set permissions up correctly for…
0
votes
2 answers

Is there a cloud dataprep api?

Ideally, I would like to write a function to start a dataprep job on one of the following events kafka message file added or change to GCS. I'm thinking I could write the triggers in python if there is a support library. But I can't find one. Happy…
Stefan Thorpe
  • 540
  • 1
  • 4
  • 11