Questions tagged [google-cloud-dataprep]

An intelligent cloud data service to visually explore, clean, and prepare data for analysis.

DataPrep (or more accurately Cloud Dataprep by Trifacta) is a visual data transformation tool built by Trifacta and offered as part of Google Cloud Platform.

It is capable of ingesting data from and writing data to several other Google services (BigQuery, Cloud Storage).

Data is transformed using recipes which are shown alongside a visual representation of the data. This allows the user to preview changes, profile columns and spot outliers and type mismatches.

When a DataPrep flow is run (either manually or scheduled), a DataFlow job is created to run the task. DataFlow is Google's managed Apache Beam service.

205 questions
2
votes
0 answers

GCP DataPrep - Unable to rename output files Error

I have created simple dataprep workflow(Source File as CSV from GCS, simple transformation(Upper case conversion) & Target - load into BigQuery). When i run this workflow job in DataPrep UI, I am getting error as: Unable to rename output files…
2
votes
2 answers

Dataprep is leaving Datasets/Tables behind in BigQuery

I am using Google Cloud Dataprep for processing data stored in BigQuery. I am having an issue with dataprep/dataflow creates a new dataset with a name starting with "temp_dataset_beam_job_" It seems to crate the temporary dataset both for failed and…
2
votes
1 answer

Combine multiple rows into single row in Google Data Prep

I have a table which has multiple payload values in separate rows. I want to combine those rows into a single row to have all the data together. Table looks something like this. +------------+--------------+------+----+----+----+----+ | Date |…
VSR
  • 87
  • 2
  • 18
2
votes
1 answer

Cloud Dataprep BigQuery Upsert

Is there a way to update rows in Google BigQuery when publishing from Cloud Dataprep? I can't find anything in the documentation. I have a dataset I'm preprocessing with Dataprep that contains new rows and updated rows on every (daily) run. I would…
Andii
  • 484
  • 1
  • 4
  • 19
2
votes
0 answers

Unexpected row in output file

Google Cloud Dataprep Hi all, I am newbie in Google Cloud Platform. I just learn about BigQuery & Dataprep. I have an input csv file (refer input file). Input file with headers After wrangling (rename columns) and run job in dataprep, below is…
Ong K.S
  • 229
  • 1
  • 4
  • 15
2
votes
2 answers

Google Dataprep copy flows from one project to another

I have two Google projects: dev and prod. I import data from also different storage buckets located in these projects: dev-bucket and prod-bucket. After I have made and tested changes in the dev environment, how can I smoothly apply (deploy/copy)…
WJA
  • 6,676
  • 16
  • 85
  • 152
2
votes
4 answers

Dataprep jobs running for over 72 hours since 6/20 update. Job status reads complete but not published

I have been running daily Dataprep jobs and since the update last week, approximately half of my jobs are now hanging and not being published. They appear as jobs in progress although when I go to the actual job page, the job appears to be complete.…
Trung Pham
  • 21
  • 1
2
votes
1 answer

Dataprep importing files with different number of columns into a dataset

I am trying to create a parameterized dataset that imports files from GCS and puts them under each other. This all works fine (Import Data > Parameterize). To give a bit of context, I store each day a .csv file with a different name referring to…
WJA
  • 6,676
  • 16
  • 85
  • 152
2
votes
1 answer

How can I move data from BigQuery or DataPrep to Firestore?

I just cleaned up my firestore collection data using DataPrep and verified the data via BigQuery. I now want to move the data back to Firestore. Is there a way to do this? I have used manual method of exporting to JSON and then uploading using a…
2
votes
0 answers

GCP Data Prep AVRO file doesn't reflect schema in Data Prep. Datetime field changed to string field

I am using Google Cloud Platforms (GCP) Data Prep (DP) to move data into Big Query (BQ) via AVRO files. I am taking the data straight from a CSV file to a AVRO file using one DP recipe with NO transformations. In DP the type of my column CreatedDate…
2
votes
1 answer

How to resolve 'Access Denied: BigQuery BigQuery: Location unknown is not yet publicly available' error processing Cloud DataPrep job

I'm running a Cloud Dataprep job, which has successfully run many times until today. It now fails with an error when creating the 'temp_dataset_beam_job_...' dataset. The error is Access Denied: BigQuery BigQuery: Location unknown is not yet…
2
votes
2 answers

Manipulate large number of files to reformat in google cloud

I have a large amount of json files in Google cloud storage that I would like to load to Bigquery. Average file size is 5MB not compressed. The problem is that they are not new line delimited so I can't load them as is to bigquery. What's my best…
2
votes
1 answer

java.lang.Long cannot be cast to java.lang.Double ERROR when using MAX()

Since the update of Cloud Dataprep yesterday 19/11/2018, I got an error everytime I'm using the function MAX(), either alone or in pivot. Some notes : I used the MAX function on another dataset and it was working. ( So max() works ) I didn't have…
Fontain
  • 21
  • 2
2
votes
1 answer

ZONE_RESOURCE_POOL_EXHAUSTED for DataFlow & DataPrep

Alright team...Dataprep running into BigQuery. I cannot for the life of me find out why I have the ZONE_RESOURCE_POOL_EXHAUSTED issue for the past 5 hours. The night before, everything was going great, but today, I am having some serious issues.…
2
votes
2 answers

Dataprep doesn't works - Cloud Dataflow Service Agent

I made a mistake deleting an user service-[project number]@dataflow-service-producer-prod.iam.gserviceaccount.com in Service accounts, I should have deleted another user. After that, the Dataprep stopped running the jobs. I've checked all guidelines…
1
2
3
13 14