Questions tagged [google-cloud-dataprep]

An intelligent cloud data service to visually explore, clean, and prepare data for analysis.

DataPrep (or more accurately Cloud Dataprep by Trifacta) is a visual data transformation tool built by Trifacta and offered as part of Google Cloud Platform.

It is capable of ingesting data from and writing data to several other Google services (BigQuery, Cloud Storage).

Data is transformed using recipes which are shown alongside a visual representation of the data. This allows the user to preview changes, profile columns and spot outliers and type mismatches.

When a DataPrep flow is run (either manually or scheduled), a DataFlow job is created to run the task. DataFlow is Google's managed Apache Beam service.

205 questions
1
vote
1 answer

Does Google Cloud Dataprep support importing Google Drive Sheets as data sources?

I'm importing datasets in Google Cloud Dataprep (by Trifacta) to perform transformations on my data sources. But I can't see Google Drive Sheets in the list after connecting them to Big Query Console. I'm about to use them as rules for my…
1
vote
2 answers

How to cancel a running job (something went wrong)?

I have a scheduled job that runs every morning, apparently since yesterday something is not going as planned in the job. And it is still running The job from yesterday is still running (normally it takes about 14 minutes) And the scheduled job from…
1
vote
1 answer

Dataprep flow from csv in Cloud Storage to Big Query table incomplete (not all records loaded)

I set up a Dataprep scheduled job flow copying and treating daily some csv and json files stored in a Cloud Storage bucket to Big Query tables. It was working fine, but since some days the job started copying in Big Query less rows than those…
1
vote
2 answers

What are the differences between Cloud Dataflow and Dataprep

Both Dataprep and Dataflow can be used for ETL tasks. In fact Dataprep seems to use Dataflow jobs. Is it that the only difference that Dataprep provides tools to write dataflow jobs with a user interface ?
Adelin
  • 18,144
  • 26
  • 115
  • 175
1
vote
0 answers

Job incomplete when run using client library but not throwing any errors

I'm trying to automate some data cleaning tasks by uploading the files to Cloud Storage, running them through a pipeline, and downloading the results. I have created the template for my pipeline to execute using the GUI in Dataprep, and am…
1
vote
0 answers

Dataflow Template - No template found at the specified path / The metadata file is malformed

Creating a Dataflow job from a Dataflow template, which was created by a Dataprep job run. Two warnings/errors show when using the template path: No template found at the specified path. The metadata file is malformed. You can also see this in the…
1
vote
0 answers

Export result profile from Dataprep

I'm using Dataprep to run a quick data quality profile on my source data. My goal is to output this profile into a sheet. I've followed the steps provided in the Dataprep docs and can now view the profile of my data on the Job Results page. Is there…
CLPatterson
  • 113
  • 1
  • 14
1
vote
0 answers

Incremental Load in Google Cloud Platform

I am trying to implement a BI solution using GCP where I have data in flat files in cloud datastore and I have to push this data in my Data Warehouse on BigQuery. The data will be incremental after the first load. There doesn't seem to be any ETL…
1
vote
2 answers

Cloud Dataprep - Replace code or id with value with middle dataset

I'm really new in GCP dataprep and now trying to create a recipe, but I can't figured out the way of doing it. In summary I have 2 files, the first one with this columns: NAME, CONTRY_CODE, ... And the second one with: COUNTRY_CODE,…
1
vote
1 answer

Remove Duplicates + first occurrences

sorry but do anybody know how i can remove duplicate Rows AND the first Occurrence in Google Dataprep? So both rows (duplicate row + 1. occurrence) will be deleteted? col1,col2 john,simpson will,farrell john,simpson elon,musk will…
1
vote
1 answer

Match all spaces between blocks of characters

I need a regular expression that matchs all whitespaces between block of characters. Block example: 500 dfdsfsd fdsfdsfdsf 9876dfsdfs df7687 I only know about /\s+/ , but it matches only the first whitespace block.. I want to get whitespace,…
Eduardo Humberto
  • 425
  • 2
  • 5
  • 16
1
vote
3 answers

Google Dataprep is having issues with UI during lookup feature

From this week, am facing issue with using lookup feature in google dataprep. Steps to reproduce Create a dataprep flow. Import a two datasets (source dataset ,lookup dataset) Create and edit recipe choose a column and do a lookup The window will…
1
vote
0 answers

Dataprep job keeps failing when unioning

I have two datasets that I'm trying to Union using dataprep. Both have 300+ columns and require a combination of matching by name and manually adding columns to match since the names aren't always the same. After matching the columns and saving, the…
Elliot Nam
  • 11
  • 1
1
vote
1 answer

Custom join in Dataprep

Can I join two tables using custom condition rather than equal? Eg, I have two tables in BigQuery. Table A has 3 columns start_range, end_range and grade. Table B has data coming from Storage using cloud functions and has a particular column…
hamedazhar
  • 990
  • 10
  • 26
1
vote
0 answers

Google Cloud Dataprep : Dimensional Modeling

I'm trying to populate my dimensions and fact tables using Cloud Dataprep. As in a dimensional model, dimensions needs to be populated before populating facts, I have got no success by chaining the flows together using Reference Datasets as the…