Questions tagged [data-lake]
161 questions
1
vote
0 answers
Which file format is suitable for unstructured data?
I am creating a data-repository more like creating data-lake for no-SQL db. I have some field which doesn't have a proper schema. They have mix type object like field value is {a:2} or {b:2,c:4, a: {1,2}}, etc.
I can use CSV format so I can save…

Manish Trivedi
- 3,481
- 5
- 23
- 29
1
vote
1 answer
Exceptions from Data Lake immutability rule
Data Lake should be immutable:
It is important that all data put in the lake should have a clear
provenance in place and time. Every data item should have a clear
trace to what system it came from and when the data was produced. The
data lake…

VB_
- 45,112
- 42
- 145
- 293
1
vote
0 answers
Powershell Set-AzDataLakeStoreItemAclEntry Error occured while sending the request
Trying to execute the following command in Powershell ISE
Set-AzDataLakeStoreItemAclEntry -Account "********" -Path "/raw2" -AceType Group -Id "******************" -Permissions All
I only have the Az module installed, no Rm
But I get the following…

dstiles74
- 11
- 3
1
vote
0 answers
AWS Datalake Formation MySQL Blueprint error with incompatible sql_mode only_full_group_by
Using AWS DataLake Formation blueprint to import a MySQL DB to S3. Used the stock blueprint to import the data. But the job failed with below error and stack trace.
Expression #4 of SELECT list is not in GROUP BY clause and contains nonaggregated…

Jimson James
- 2,937
- 6
- 43
- 78
1
vote
2 answers
Splunk migration to S3 DataLake
We're looking at moving away from Splunk as our datastore and looking at AWS Data Lake backed by S3.
What would be the process of migrating data from Splunk to S3? I've read lots of documents talking about archiving data from Splunk to S3 but not…

Garreth
- 1,057
- 2
- 9
- 24
1
vote
3 answers
ETL from AWS DataLake to RDS
I'm relatively new to DataLakes and Im going through some research for a project on AWS.
I have created a DataLake and have tables generated from Glue Crawlers, I can see the data in S3 and query it using Athena. So far so good.
There is a…

Garreth
- 1,057
- 2
- 9
- 24
1
vote
1 answer
I need to get file last modified dates of Data Lake files in SSIS
I have an SSIS task that reads JSON files from Azure Data Lake, parses them with deserialize command in a Script Task and creates copies of them as tables in a Local SQL Server.
This is going on well, but very slow. It takes me 6 hours to import one…

EnisAkin
- 43
- 5
1
vote
1 answer
What is the difference between a data lake with HDFS or S3 in AWS?
I need to build a data lake on AWS, but I don't know how exactly S3 is different from HDFS. I found some answers in the Internet but I still don't understand the real difference.
I also need to know if someone has the data lake architecture of HDFS…

Aziza Sbai El Idrissi
- 81
- 1
- 8
1
vote
2 answers
Can you use HDFS as your principal storage?
Is its reliable to save your data in Hadoop and consume it using Spark/Hive etc?
What are the advantages of using HDFS as your main storage?

marz
- 831
- 1
- 7
- 12
1
vote
1 answer
Unable to parse list of Json blocks in U-SQL
I have a file with list of json blocks and am stuck with processing/Reading them in U-Sql and writing to a text file.
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
…

Creator
- 31
- 2
1
vote
2 answers
Can an Data Warehouse include a Data lake?
I want to understand data warehouse and data lake more in detail.
It seems to me there is different information to the topic. Inmon defines a data warehouse as
a subject-oriented, integrated, time-variant and non-volatile collection of data in…

A.Dumas
- 2,619
- 3
- 28
- 51
1
vote
1 answer
What is a Data Warehouse and can it be applied to complex data?
I want to define data warehouse with the necessary literature reference.
I found on wikipedia that wiki
DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one
single place…

A.Dumas
- 2,619
- 3
- 28
- 51
1
vote
1 answer
Multiple Tableau users connected to Hive LLAP
I’m hoping to allow interactive queries for many Tableau users with data accessed via Hive LLAP. So far results have disappointed.... should I expect this setup to work for me or should I use a different backend?

1ijk
- 1,417
- 2
- 19
- 31
1
vote
1 answer
How to use JSON file formats in the context of Azure Data Lake Analytics respectively usql
I have a JSON input that looks like
{
"sessionId": 1234,
"deviceId": "MAC:1234",
"IoTHub": {
"MessageId": "1234-1234-1234-1234"
}
}
How can I extract the values of sessionId, deviceId and MessageId in a Azure Datalake…

quervernetzt
- 10,311
- 6
- 32
- 51
1
vote
1 answer
Azure Data Lake: How to get Processed files
I've just started working with Data Lake and I'm currently trying to figure out the real workflow steps and how to automatize the whole process.
Say I have some files as an input and I would like to process them and download output files in order…

Vladimir Semashkin
- 1,270
- 1
- 10
- 21