Questions tagged [data-lake]

161 questions
0
votes
2 answers

Column names are incorrectly Mapped

I was trying to pull/load data from on-prem data lake to azure data lake using Azure Data Factory. I was just giving query to pull all the columns. My Sink is Azure Data Lake Gen2. But my Column names are coming wrong in source and sink. My columns…
0
votes
1 answer

Guidance needs to setup datalake

I need some guidance in setting up datalake: We are pulling data from source (rest api) which returns JSON file. Sample structure given below. { "version": "3.0", "name": "application_name", ... "request": { …
Tyash
  • 11
  • 4
0
votes
1 answer

SaaS App data ingestion to DL/DWH - what include into NFR?

We are in the process for buying SaaS solution for busy sales operations. We want to ensure that we have ability to access our data and ingest it into our analytics data lake (some real-time). I am looking for advice for what requirements should we…
Dovile K.
  • 27
  • 5
0
votes
1 answer

Greenplum Database

What are the major differences between Greenplum (GPDB) Community vs Enterprise Editions? I want to get more details about the features which are available in enterprise edition as compared to community edition of Greenplum database.
0
votes
2 answers

Setup Datapipeline Flow in AWS

Problem Statement: We have a Postgres RDS (Managed by AWS), and there is a requirement to set up a data lake (In S3) for all the data that are there in RDS. The data should be pushed to s3 on a near real-time basis, the solution should also take…
0
votes
1 answer

How Can Apache Hudi merge delta asynchronously?

I'm new to Apache Hudi. In Apache Hudi, merge on read table type merge delta data asynchronously. It is merged when data is queried or the merge config(interval or unmerged commit count) is meet. But Hudi has not own background process, otherwise…
SHRIN
  • 318
  • 3
  • 15
0
votes
0 answers

using multiple integration tools on hdfs

I am working on a small project. The aim of the project is to use framework ingestion tools to ingest data in to a data lake. -I will be ingesting data in batches. -The data formats will be RDBMS, csv files and flat files. I've done my research on…
0
votes
1 answer

Secure File Transfer to Google Cloud Storage

I'm trying to make an architecture for a data lake, I already generated my CSV, txt, and Avro files they are in an On-Premise machine and I want to upload them to Google Cloud Storage, but I see that I have to go through the public internet and I…
0
votes
1 answer

Delta Lake: don't we need time partition for full reprocessed tables anymore

Objective Suppose you're building Data Lake and Star Schema with help of ETL. Storage format is Delta Lake. One of the ETL responsibilities is to build Slowly Changing Dimension (SCD) tables (cummulative state). This means that every day for every…
VB_
  • 45,112
  • 42
  • 145
  • 293
0
votes
2 answers

How to deal with historicization data in a data lake vs data warehouse?

It is possible (or even a core functionality) having data historicized within a classic data warehouse. Data will be added to the data warehouse over time and it is possible to move in time over the data. If I just want to use the data lake and to…
STORM
  • 4,005
  • 11
  • 49
  • 98
0
votes
1 answer

Get ADLS directory and sub-directory paths till it gets the file format in a table using databricks

I have a ADLS which has several folders which inturn has sub-folders and so on till the point we have either CSV or Parquet data in it. How to get the Folder names and subfolders in this folder with the file format in databricks? Also there are some…
user14058264
0
votes
0 answers

Unable to see tables in the AWS datalake/glue UI

Image showing tables created. (crawler snapshot) Unable to see tables under databases tab in the AWS datalake/glue UI even though the Crawler log states that - 2 tables have been created. 2020-09-05T15:16:45.020+05:30 …
0
votes
1 answer

How do you delete a file from an Azure Data Lake using the Python SDK?

I'm using the azure-storage-file-datalake plugin for Python 3.8. The SDK is described in great depth here…
Dave
  • 15,639
  • 133
  • 442
  • 830
0
votes
1 answer

AWS Glue sync data from RDS (need to sync 4 table from all schema) to S3 (apache parque format)

We are using a Postgres RDS instance (db.t3.2xlarge with around 2TB data). We have a multi-tenancy application so for all organizations who sign up in our product, we are creating a separate schema which replicates our data model. Now a couple of…
0
votes
0 answers

For spring based microservices , How can we push and pull data from a Data lake. .How can you interact with data in data lake using microservices

I want to create a spring based microservice architecture where I can receive some reviews about a product and want to store it to a data lake .And whenever I need it I want to retrieve it back from that Data Lake .So what additional programming…