Questions tagged [data-lake]

161 questions
0
votes
1 answer

How to create a data lake from Kafka to Hdfs with Spark - storing in custom directories?

I have an RDD transformed into a dataFrame of the following structure: +-------------+--------------------+ | key| …
0
votes
1 answer

Azure Lake to Lake transfer of files

My company has two Azure environments. The first one was a temporary environment and is being re-purposed / decommissioned / I'm not sure. All I know is I need to get files from one Data Lake on one environment, to a DataLake on another. I've looked…
Beth
  • 3
  • 1
0
votes
0 answers

Advice Datalake / datawarehouse BigQuery setup

First of all, apologies in advance for the long story and using the wrong terminology sometimes. Hopefully someone can advice us in how to optimally implement BigQuery into our organization. Current setup At the moment, we have a data warehouse in…
0
votes
1 answer

Accessing azure data lake folders from windows explorer

It is possible to access Azure data lake folders from windows explorer through SMB or file share like we can do with Azure file storage?
Geekn
  • 2,650
  • 5
  • 40
  • 80
0
votes
2 answers

How to create a Datalake using Apache Kafka, Amazon Glue and Amazon S3?

I want to store all the data from a Kafka's topic into Amazon S3. I have a Kafka cluster that receives in one topic 200.000 messages per second, and each value message has 50 fields (strings, timestamps, integers, and floats). My main idea is to use…
0
votes
1 answer

How to execute U-SQL job with code behind from .NET SDK

I have a U-SQL job that uses custom Extractors in code behind. And I need to run it on demand from the C# code. I found a way to submit job by passing the script like a string. Can I execute somehow the script with a custom extractor?
Oksana Serdiuk
  • 143
  • 1
  • 7
0
votes
1 answer

getting some extra files without any extension on Azure Data Lake Store

I am using Azure data Lake Store for files Storage. I am using operations like Creating a main file Creating part files Appending these part files to main file (for Concurrent append) Example: There is main log file (eventually will contain…
UmairAhmad
  • 150
  • 3
  • 14
0
votes
1 answer

What is the Access IP Parameter in AWS Data Lake Solution Cloud Formation Template?

I'm a newbie in AWS and I'm trying to deploy the model data lake solution on AWS by following this : https://docs.aws.amazon.com/solutions/latest/data-lake-solution/deployment.html To deploy the cloud formation template, it asks for an Access IP…
0
votes
1 answer

U-SQL Compare Rowset Data in scalar expression

I have gone through some articles that conversion is not possible however I have come across an issue where a value is been fetched in the rowset and needs to be used in the scalar expression. ColumnA is a string value and ColumnB is an…
0
votes
1 answer

Slow running U-SQL Job due to SqlFilterTransformer

I have a U-SQL job that extracts data from 2 .tsv and 2 .csv files, selects some features and performs some simple transformations before outputting to csv/tsv files in ADL. However, when I attempt to add further transformations within SELECT…
Matt Lakin
  • 31
  • 5
0
votes
1 answer

Backup of Data Lake Store

I am working on a backup strategy for Data Lake Store (DLS). My plan is to create two DLS accounts and copy data between them. I have evaluated several approaches to achieve this but none of them satisfies the requirement to preserve the POSIX ACLs…
MrG
  • 15
  • 1
  • 6
0
votes
1 answer

deal with multiple reader/writer in azure data lake

I am new to azure data lake and am currently using data factory v2 to move data from my transactional database to azure data lake storage. Consider a scenario Company has multiple datasources Team A is responsible for Source A Team B is responsible…
frictionlesspulley
  • 11,070
  • 14
  • 66
  • 115
0
votes
0 answers

AWS Glue vs Zaloni metadata management

What is the value add of a solution such as Zaloni over AWS Glue in terms of metadata harvesting/management? Are use cases for Zaloni specific to Hadoop? What if the Data Lake were based on S3 + RDS'?
Si Downes
  • 51
  • 4
0
votes
2 answers

Port exhaustion on Azure Data Lake Store

I am doing performance testing of my Azure Web API that receives file attachments from the client and then uploads them to the Data Lake Store. My performance test is currently running for 6 minutes with a load of 250 users making 40 requests/sec.…
0
votes
1 answer

Data catalog and Meta data management in AWS for a Data Lake architecture

We are setting up a data platform loosely based on the Data Lake architecture. We are evaluating candidates that provide centralized data catalog and meta-data management and tagging. Glue seems very promising, but it's still not out for public…
uncaught_exceptions
  • 21,712
  • 4
  • 41
  • 48
1 2 3
10
11