Questions tagged [data-lake]
161 questions
0
votes
0 answers
Best pattern for file structure for fast accessing(downloading) from data lake .net SDK
Currently i'm storing my data in 4 diffrent file (that handle data from specified day,month,week,year). The day size of file is 1440 (that handle data which is send reads every minute and the date of reads is also stored per line) In this way if i…

dawcza94
- 327
- 2
- 10
0
votes
0 answers
indexing and navigating in S3 metadata
I'm getting an important number of S3 CSV files, for each
fo.csv S3 file there is a fo.metadata.txt file with interesting metadata describing csv columns and giving additional info.
I'm looking for the best way to navigate in the metadata of all…

user3313834
- 7,327
- 12
- 56
- 99
-1
votes
0 answers
For data lake storage in AWS S3. What are the advantages of Apache Iceberg over raw parquet Tables?
We are building a data lake and we are storing the data in S3 in parquet format. We are extracting and transforming with Glue. It was proposed that we use Apache Iceberg as table format instead of regular parquet files in partitions.
I understand…

Cristobal Sarome
- 178
- 11
-1
votes
1 answer
Data Storage and Analytics on AWS
I have one data analytics requirement on AWS. I have limited knowledge on Big Data processing, but based on my
analysis, I have figured out some options.
The requirement is to collect data by calling a Provider API every 30 mins. (data…

Sudheer Kumar
- 311
- 4
- 16
-1
votes
1 answer
Tool for storing infromation about tables, their sources and ETL for DWH
I'm searching for tool for storing documentation about tables, datasources, etl processes and etc for my DWH.
I've seen some presentations on youtube, but I've found out, that most of the companies are using custom, own system or something like wiki…

Arhimag
- 9
- 6
-1
votes
1 answer
what is the best way to re-create relational database from change log(data lake) in AWS S3?
I have stored changelogs(data with information about data) from non-relational schemaless data tables to S3. now I want some structured relational database to query on all the data. So I need to create a database from S3. Now I am confused about…

isambitd
- 829
- 8
- 14
-1
votes
1 answer
schedule a pipeline of azure data lake store which runs on every Monday at 8 am UTC
Output Data Set:
"availability": {"frequency": "Day","interval": 1,"offset": "03:00:00","style": "StartOfInterval"}
Pipeline:
"scheduler": {"frequency": "Day","interval": 1,"offset": "03:00:00","style": "StartOfInterval"}

shane
- 127
- 1
- 1
- 6
-2
votes
1 answer
Cost breakdown for a Cloud Data Lake Implementation
We have a client in need of a data lake on the cloud.
We need to provide the client the chance to breakdown costs between their areas in just one AWS Account.
We are talking about query and data transfer costs also.
-2
votes
3 answers
Comparison between Big Data and Data Lakes , difference and similarities
Can someone tell me the similarities and differences between Big data and Data Lakes.
Can't find a satisfactory answer anywhere.

acekuber
- 101
- 1
- 2
- 7
-3
votes
2 answers
Data warehouse/database/data lake for idiots
Hello geniuses (dare I say, Minkus'?)
A bit of background. I work for a small, non-tech company that currently does not have a data warehouse. All data is manually pulled from a bunch of sources (let's say different platforms like Facebook and…

Kat
- 13
- 3