Highest Voted 'data-pipeline' Questions

0

votes

2 answers

Jupyter notebooks as Kedro node

How can I use a Jupyter Notebook as a node in Kedro pipeline? This is different from converting functions from Jupyter Notebooks into Kedro nodes. What I want to do is using the full notebook as the node.

asked Sep 14 '20 at 20:49

MCK

11

0

votes

1 answer

Streamsets Data Collector: Replace a Field With Its Child Value

I have a data structure like this { "id": 926267, "updated_sequence": 2304899, "published_at": { "unix": 1589574240, "text": "2020-05-15 21:24:00 +0100", "iso_8601": "2020-05-15T20:24:00Z" }, "updated_at": { "unix":…

etl streamsets data-pipeline

asked May 16 '20 at 03:48

asrulsibaoel

500
1
7
14

0

votes

1 answer

Logical decoding - postgres - multiple output formats

I have been trying to build a pipeline using logical decoding of postgres. However, I am a little confused. Please find below the questions I have I have established a pub-sub and I can see the data flowing between the 2 servers. However, I haven't…

postgresql data-pipeline logical-decoding

asked May 14 '20 at 05:50

rathimittha

45
6

0

votes

1 answer

Access denied error while executing tensorflow example - https://www.tensorflow.org/tutorials/load_data/images

The link shows an example of data pipeline for images it works fine when I run directly on colab but when I use it on my laptop its gives this error. I've been using Keras for quite a while but this is the 1st time trying data pipelining and I…

tensorflow deep-learning google-colaboratory data-pipeline

asked May 13 '20 at 01:35

Vengenzz Vicky

115
7

0

votes

0 answers

How can I ingest data from a Microsoft SQL Server into Google Cloud Platform?

I've been reading the GCP documentation trying to find a way to ingest data from a Microsoft SQL Server database passively (like using Cloud SQL). The problem is that Cloud SQL keeps idle most of the time (data is updated once a week) and I could…

sql-server google-cloud-platform google-cloud-sql google-cloud-dataprep data-pipeline

asked May 07 '20 at 14:19

Eduardo Humberto

425
2
5
16

0

votes

1 answer

Cloud Composer/Airflow Task Runner Storage

I'm used to running pipelines via AWS data pipelines but getting familiar with Airflow (Cloud Composer). In data pipelines we would: Spawn a task runner, Bootstrap it, Do work, Kill the task runner. I just realized that my airflow runners are not…

airflow google-cloud-composer amazon-data-pipeline gcsfuse data-pipeline

asked Mar 20 '20 at 03:05

JW2

349
2
16

0

votes

1 answer

How to Dynamically adding HTTP endpoint to load data into azure data lake by using Azure Data Factory and the REST api is cookie autheticated

I am trying to dynamically add/update linked service REST based on certain trigger/events to consume a RESP API to be authenticated using cookie which provides telemetry data. This telemetry data will be stored in Data Lake Gen2 and then will use…

azure rest session-cookies azure-data-factory data-pipeline

asked Mar 18 '20 at 16:03

Isaiyavan Babu Karan

987
2
8
17

0

votes

1 answer

Schema not merged properly with an AWS Glue crawler

I am currently building a datalake where I run AWS GlueJobs daily to copy data in our database and make them queryable via AWS Athena. Because the schema of the data I fetch changes often, I crawl them regularly with a Glue Crawler. Unfortunately,…

amazon-web-services aws-glue data-pipeline

asked Dec 23 '19 at 13:20

Robin Nicole

646
4
17

0

votes

1 answer

Can I use Prometheus to list the files processing or already processed?

I need to know the time per service of an application, which is processing some files. So I mean: the same file passes through each service and I need to know each pipeline time. Is that possible with Prometheus and, for example, Grafana? Or there…

python logging prometheus grafana data-pipeline

asked Dec 10 '19 at 13:07

WillianSteiangel

5
3

0

votes

2 answers

Cannot get AWS Data Pipeline connected to Redshift

I have a query I'd like to run regularly in Redshift. I've set up an AWS Data Pipeline for it. My problem is that I cannot figure out how to access Redshift. I keep getting "Unable to establish connection" errors. I have an Ec2Resource and I've…

amazon-redshift aws-security-group data-pipeline

asked Jun 23 '19 at 21:27

ScottieB

3,958
6
42
60

0

votes

1 answer

Splitting file into small chunks and processing

I have three files and each contain close to 300k records. Have written a python script to process those files with some business logic and able to create the output file successfully. This process completes in 5 mins. I am using the same script to…

python data-pipeline

asked Jun 21 '19 at 01:35

Hari

51
1
5

0

votes

1 answer

How do I use tf.Dataset to load data into multiple GPUs?

Currently, I'm passing the data into multiple GPUs using get_next(). Is there a better way to feed data into multiple GPU s?

tensorflow deep-learning data-pipeline

asked May 01 '19 at 19:50

Illuminati0x5B

602
7
24

0

votes

1 answer

Data pipeline - dumping large files from API responses into AWS then with final destination being on premises SQL Server

I'm new to building data pipelines where dumping files in the cloud is one or more steps in the data flow. Our goal is to store large, raw sets of data from various APIs in the cloud then only pull what we need (summaries of this raw data) and store…

sql-server amazon-web-services amazon-s3 data-pipeline

asked Feb 05 '19 at 15:53

eTothEipiPlus1

577
2
9
28

0

votes

1 answer

Configure datapipeline to receive parameter values from a Lambda

I have a Lambda function that activates a datapipeline: client.activate_pipeline( pipelineId='df-0680373LNPNFF73UDDD', parameterValues=[{'id':'myVariable','stringValue':'ok'}]) How do I configure the datapipeline to receive the…

amazon-web-services amazon-data-pipeline data-pipeline

asked Feb 05 '19 at 07:24

Med Acc

3
2

0

votes

1 answer

Make a generic/parametrize trigger in azure data factory

I want to load data from on premise to azure blobs. I have data on three on premise servers. Problem is that data copying should run at different time for each source. Please suggest a way to do that.

azure triggers azure-data-factory data-pipeline

asked Dec 05 '18 at 16:15

Akshay Mathur

1
1

Questions tagged [data-pipeline]