Questions tagged [data-pipeline]

168 questions
1
vote
1 answer

Snowflake Stitch connection throws 403 error

I’m trying to connect snowflake with stitch, I’m trying to get google sheets data to snowflake using stitch. I’ve strictly followed documentation but the connection exits with 403 error. Please help me resolve this issue
1
vote
0 answers

Container exits after script is run in cloud run

I have a Dockerfile that runs a python script to process a dataset and store in GCS. I've uploaded the image to GCR and I can run it with Cloud run. It runs fine except for the fact when the script execution is done, the container exits, see…
akilesh raj
  • 656
  • 1
  • 8
  • 19
1
vote
2 answers

Load custom Data into a tensorflow pipeline

I am trying to implement this code which loads the data from the official tensorflow dataset to get it to load my data placed on my google drive dataset, metadata = tfds.load('cycle_gan/horse2zebra', with_info=True,…
1
vote
1 answer

Connection error of AWS Redshift to local computer

I tried to connect Amazon Redshift to my local computer using pycopg2. However, I got an error message: psycopg2.OperationalError: could not connect to server: Operation timed out. Is the server running on host xxx and accepting TCP/IP connecitons…
1
vote
3 answers

NameError: name 'datetime' is not defined [while running 'ChangeDataType DistrictAllocationAndListStore-ptransform-570']

I wrote code to inject data from CSV file to Google's BigQuery. I used apache beam for the pipeline. This is the pipeline code: list_of_data = open_file() DistrictAllocationAndListStore_data = (p | 'CreateDictData…
1
vote
1 answer

What is the use of ValueProvider class in apache beam?

I am trying to understand the purpose the ValueProvider class in Apache beam. I had seen in some examples the the Pipeline option values are wrapped by ValueProvider. But I couldn't get any relevant documentation to understand this class.
1
vote
2 answers

Estimate duration of DynamoDB data export via Data Pipeline

My DynamoDB table has around 100 million (30GB) items and I provisioned it with 10k RCUs. I'm using a data pipeline job to export the data. The DataPipeline Read Throughput Ratio set to 0.9. How do I calculate the time for the export to be…
1
vote
1 answer

How would a data pipeline using S3 as raw data work?

I am currently using AWS S3 as a data lake to store raw data, which adds about 100 items every minute to the designated bucket. I know the very basics of the data pipeline and data ETL concept, but I am still unfamiliar with the fundamentals, such…
1
vote
1 answer

why did amount of data from bigquery decrease noticeably without any change in ga/firebase options?

I use Bigquery to get raw data from ga and firebase. I could get about 100000 ~ 200000 rows of log data from Bigquery. But since last week, I got about 1000 rows from Bigquery. enter image description here I didn't change any options for ga,…
1
vote
1 answer

How should I keep track of total loss while training a network with a batched dataset?

I am attempting to train a discriminator network by applying gradients to its optimizer. However, when I use a tf.GradientTape to find the gradients of loss w.r.t training variables, None is returned. Here is the training loop: def train_step(): …
1
vote
1 answer

Replication pipeline to replicate data from MySql RDS to Redshift

My problem is here to create a replication pipeline that replicates tables and data from MySql RDS to Redshift and I cannot use any managed service. Also, any new updates in RDS should be replicated in the redshift tables as well. After looking at…
1
vote
1 answer

How to import Pascal VOC 2012 segmentation dataset to Google Colab?

I am new in building data pipe-line. I want to import Pascal VOC dataset into Google Colab. Can some please point to me a good Google Colab/Jupyter notebook file?
1
vote
1 answer

How do I create image sequence samples using tf.data?

I want to create image sequence samples using the tf.data API. But as of now, it seems like there is no easy way to concatenate multiple images to form a single sample. I have tried to use the dataset.window function, which groups my images right.…
1
vote
1 answer

How can i solve InvalidArgumentError: cycle_length must be > 0 when load tfrecords file

I am starting out with build a efficient data pipeline of audio file using tf.TFRecord and tf.Example. But i get an error tensorflow.python.framework.errors_impl.InvalidArgumentError when i trying to load data from saved tfrecords file. I have been…
levanpon
  • 33
  • 1
  • 3
1
vote
2 answers

Differences between matillion and apache airflow

I want to use an ETL service, but i am stuck between Apache Airflow and Matillion. Are they the same? What are the main differences?
eflorespalma
  • 325
  • 1
  • 2
  • 8