Questions tagged [data-lake]

161 questions
0
votes
0 answers

Best pattern for file structure for fast accessing(downloading) from data lake .net SDK

Currently i'm storing my data in 4 diffrent file (that handle data from specified day,month,week,year). The day size of file is 1440 (that handle data which is send reads every minute and the date of reads is also stored per line) In this way if i…
dawcza94
  • 327
  • 2
  • 10
0
votes
0 answers

indexing and navigating in S3 metadata

I'm getting an important number of S3 CSV files, for each fo.csv S3 file there is a fo.metadata.txt file with interesting metadata describing csv columns and giving additional info. I'm looking for the best way to navigate in the metadata of all…
user3313834
  • 7,327
  • 12
  • 56
  • 99
-1
votes
0 answers

For data lake storage in AWS S3. What are the advantages of Apache Iceberg over raw parquet Tables?

We are building a data lake and we are storing the data in S3 in parquet format. We are extracting and transforming with Glue. It was proposed that we use Apache Iceberg as table format instead of regular parquet files in partitions. I understand…
-1
votes
1 answer

Data Storage and Analytics on AWS

I have one data analytics requirement on AWS. I have limited knowledge on Big Data processing, but based on my analysis, I have figured out some options. The requirement is to collect data by calling a Provider API every 30 mins. (data…
Sudheer Kumar
  • 311
  • 4
  • 16
-1
votes
1 answer

Tool for storing infromation about tables, their sources and ETL for DWH

I'm searching for tool for storing documentation about tables, datasources, etl processes and etc for my DWH. I've seen some presentations on youtube, but I've found out, that most of the companies are using custom, own system or something like wiki…
Arhimag
  • 9
  • 6
-1
votes
1 answer

what is the best way to re-create relational database from change log(data lake) in AWS S3?

I have stored changelogs(data with information about data) from non-relational schemaless data tables to S3. now I want some structured relational database to query on all the data. So I need to create a database from S3. Now I am confused about…
isambitd
  • 829
  • 8
  • 14
-1
votes
1 answer

schedule a pipeline of azure data lake store which runs on every Monday at 8 am UTC

Output Data Set: "availability": {"frequency": "Day","interval": 1,"offset": "03:00:00","style": "StartOfInterval"} Pipeline: "scheduler": {"frequency": "Day","interval": 1,"offset": "03:00:00","style": "StartOfInterval"}
shane
  • 127
  • 1
  • 1
  • 6
-2
votes
1 answer

Cost breakdown for a Cloud Data Lake Implementation

We have a client in need of a data lake on the cloud. We need to provide the client the chance to breakdown costs between their areas in just one AWS Account. We are talking about query and data transfer costs also.
-2
votes
3 answers

Comparison between Big Data and Data Lakes , difference and similarities

Can someone tell me the similarities and differences between Big data and Data Lakes. Can't find a satisfactory answer anywhere.
acekuber
  • 101
  • 1
  • 2
  • 7
-3
votes
2 answers

Data warehouse/database/data lake for idiots

Hello geniuses (dare I say, Minkus'?) A bit of background. I work for a small, non-tech company that currently does not have a data warehouse. All data is manually pulled from a bunch of sources (let's say different platforms like Facebook and…
Kat
  • 13
  • 3
1 2 3
10
11