Questions tagged [data-lake]

161 questions
1
vote
0 answers

Which file format is suitable for unstructured data?

I am creating a data-repository more like creating data-lake for no-SQL db. I have some field which doesn't have a proper schema. They have mix type object like field value is {a:2} or {b:2,c:4, a: {1,2}}, etc. I can use CSV format so I can save…
Manish Trivedi
  • 3,481
  • 5
  • 23
  • 29
1
vote
1 answer

Exceptions from Data Lake immutability rule

Data Lake should be immutable: It is important that all data put in the lake should have a clear provenance in place and time. Every data item should have a clear trace to what system it came from and when the data was produced. The data lake…
VB_
  • 45,112
  • 42
  • 145
  • 293
1
vote
0 answers

Powershell Set-AzDataLakeStoreItemAclEntry Error occured while sending the request

Trying to execute the following command in Powershell ISE Set-AzDataLakeStoreItemAclEntry -Account "********" -Path "/raw2" -AceType Group -Id "******************" -Permissions All I only have the Az module installed, no Rm But I get the following…
dstiles74
  • 11
  • 3
1
vote
0 answers

AWS Datalake Formation MySQL Blueprint error with incompatible sql_mode only_full_group_by

Using AWS DataLake Formation blueprint to import a MySQL DB to S3. Used the stock blueprint to import the data. But the job failed with below error and stack trace. Expression #4 of SELECT list is not in GROUP BY clause and contains nonaggregated…
Jimson James
  • 2,937
  • 6
  • 43
  • 78
1
vote
2 answers

Splunk migration to S3 DataLake

We're looking at moving away from Splunk as our datastore and looking at AWS Data Lake backed by S3. What would be the process of migrating data from Splunk to S3? I've read lots of documents talking about archiving data from Splunk to S3 but not…
Garreth
  • 1,057
  • 2
  • 9
  • 24
1
vote
3 answers

ETL from AWS DataLake to RDS

I'm relatively new to DataLakes and Im going through some research for a project on AWS. I have created a DataLake and have tables generated from Glue Crawlers, I can see the data in S3 and query it using Athena. So far so good. There is a…
Garreth
  • 1,057
  • 2
  • 9
  • 24
1
vote
1 answer

I need to get file last modified dates of Data Lake files in SSIS

I have an SSIS task that reads JSON files from Azure Data Lake, parses them with deserialize command in a Script Task and creates copies of them as tables in a Local SQL Server. This is going on well, but very slow. It takes me 6 hours to import one…
EnisAkin
  • 43
  • 5
1
vote
1 answer

What is the difference between a data lake with HDFS or S3 in AWS?

I need to build a data lake on AWS, but I don't know how exactly S3 is different from HDFS. I found some answers in the Internet but I still don't understand the real difference. I also need to know if someone has the data lake architecture of HDFS…
1
vote
2 answers

Can you use HDFS as your principal storage?

Is its reliable to save your data in Hadoop and consume it using Spark/Hive etc? What are the advantages of using HDFS as your main storage?
marz
  • 831
  • 1
  • 7
  • 12
1
vote
1 answer

Unable to parse list of Json blocks in U-SQL

I have a file with list of json blocks and am stuck with processing/Reading them in U-Sql and writing to a text file. { "id": "0001", "type": "donut", "name": "Cake", "ppu": 0.55, "batters": { "batter": …
Creator
  • 31
  • 2
1
vote
2 answers

Can an Data Warehouse include a Data lake?

I want to understand data warehouse and data lake more in detail. It seems to me there is different information to the topic. Inmon defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data in…
A.Dumas
  • 2,619
  • 3
  • 28
  • 51
1
vote
1 answer

What is a Data Warehouse and can it be applied to complex data?

I want to define data warehouse with the necessary literature reference. I found on wikipedia that wiki DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place…
A.Dumas
  • 2,619
  • 3
  • 28
  • 51
1
vote
1 answer

Multiple Tableau users connected to Hive LLAP

I’m hoping to allow interactive queries for many Tableau users with data accessed via Hive LLAP. So far results have disappointed.... should I expect this setup to work for me or should I use a different backend?
1ijk
  • 1,417
  • 2
  • 19
  • 31
1
vote
1 answer

How to use JSON file formats in the context of Azure Data Lake Analytics respectively usql

I have a JSON input that looks like { "sessionId": 1234, "deviceId": "MAC:1234", "IoTHub": { "MessageId": "1234-1234-1234-1234" } } How can I extract the values of sessionId, deviceId and MessageId in a Azure Datalake…
quervernetzt
  • 10,311
  • 6
  • 32
  • 51
1
vote
1 answer

Azure Data Lake: How to get Processed files

I've just started working with Data Lake and I'm currently trying to figure out the real workflow steps and how to automatize the whole process. Say I have some files as an input and I would like to process them and download output files in order…
Vladimir Semashkin
  • 1,270
  • 1
  • 10
  • 21