Questions tagged [data-lake]
161 questions
0
votes
1 answer
How to create a data lake from Kafka to Hdfs with Spark - storing in custom directories?
I have an RDD transformed into a dataFrame of the following structure:
+-------------+--------------------+
| key| …

El Shotodore
- 29
- 5
0
votes
1 answer
Azure Lake to Lake transfer of files
My company has two Azure environments. The first one was a temporary environment and is being re-purposed / decommissioned / I'm not sure. All I know is I need to get files from one Data Lake on one environment, to a DataLake on another. I've looked…

Beth
- 3
- 1
0
votes
0 answers
Advice Datalake / datawarehouse BigQuery setup
First of all, apologies in advance for the long story and using the wrong terminology sometimes.
Hopefully someone can advice us in how to optimally implement BigQuery into our organization.
Current setup
At the moment, we have a data warehouse in…

Oebie
- 19
- 5
0
votes
1 answer
Accessing azure data lake folders from windows explorer
It is possible to access Azure data lake folders from windows explorer through SMB or file share like we can do with Azure file storage?

Geekn
- 2,650
- 5
- 40
- 80
0
votes
2 answers
How to create a Datalake using Apache Kafka, Amazon Glue and Amazon S3?
I want to store all the data from a Kafka's topic into Amazon S3. I have a Kafka cluster that receives in one topic 200.000 messages per second, and each value message has 50 fields (strings, timestamps, integers, and floats).
My main idea is to use…

Eric Bellet
- 1,732
- 5
- 22
- 40
0
votes
1 answer
How to execute U-SQL job with code behind from .NET SDK
I have a U-SQL job that uses custom Extractors in code behind. And I need to run it on demand from the C# code.
I found a way to submit job by passing the script like a string. Can I execute somehow the script with a custom extractor?

Oksana Serdiuk
- 143
- 1
- 7
0
votes
1 answer
getting some extra files without any extension on Azure Data Lake Store
I am using Azure data Lake Store for files Storage. I am using operations like
Creating a main file
Creating part files
Appending these part files to main file (for Concurrent append)
Example:
There is main log file (eventually will contain…

UmairAhmad
- 150
- 3
- 14
0
votes
1 answer
What is the Access IP Parameter in AWS Data Lake Solution Cloud Formation Template?
I'm a newbie in AWS and I'm trying to deploy the model data lake solution on AWS by following this : https://docs.aws.amazon.com/solutions/latest/data-lake-solution/deployment.html
To deploy the cloud formation template, it asks for an Access IP…

Dileepa Jayakody
- 535
- 1
- 6
- 19
0
votes
1 answer
U-SQL Compare Rowset Data in scalar expression
I have gone through some articles that conversion is not possible however I have come across an issue where a value is been fetched in the rowset and needs to be used in the scalar expression.
ColumnA is a string value and ColumnB is an…

Amir Parkar
- 51
- 5
0
votes
1 answer
Slow running U-SQL Job due to SqlFilterTransformer
I have a U-SQL job that extracts data from 2 .tsv and 2 .csv files, selects some features and performs some simple transformations before outputting to csv/tsv files in ADL.
However, when I attempt to add further transformations within SELECT…

Matt Lakin
- 31
- 5
0
votes
1 answer
Backup of Data Lake Store
I am working on a backup strategy for Data Lake Store (DLS). My plan is to create two DLS accounts and copy data between them. I have evaluated several approaches to achieve this but none of them satisfies the requirement to preserve the POSIX ACLs…

MrG
- 15
- 1
- 6
0
votes
1 answer
deal with multiple reader/writer in azure data lake
I am new to azure data lake and am currently using data factory v2 to move data from my transactional database to azure data lake storage.
Consider a scenario
Company has multiple datasources
Team A is responsible for Source A
Team B is responsible…

frictionlesspulley
- 11,070
- 14
- 66
- 115
0
votes
0 answers
AWS Glue vs Zaloni metadata management
What is the value add of a solution such as Zaloni over AWS Glue in terms of metadata harvesting/management?
Are use cases for Zaloni specific to Hadoop? What if the Data Lake were based on S3 + RDS'?

Si Downes
- 51
- 4
0
votes
2 answers
Port exhaustion on Azure Data Lake Store
I am doing performance testing of my Azure Web API that receives file attachments from the client and then uploads them to the Data Lake Store. My performance test is currently running for 6 minutes with a load of 250 users making 40 requests/sec.…

Sarmad
- 303
- 3
- 16
0
votes
1 answer
Data catalog and Meta data management in AWS for a Data Lake architecture
We are setting up a data platform loosely based on the Data Lake architecture. We are evaluating candidates that provide centralized data catalog and meta-data management and tagging. Glue seems very promising, but it's still not out for public…

uncaught_exceptions
- 21,712
- 4
- 41
- 48