Highest Voted 'data-lakehouse' Questions

0

votes

3 answers

Create database for fabric_lakehouse is not permitted using Apache Spark in Microsoft Fabric

I followed the instruction Use delta tables in Apache Spark but when I try to save the tables into lakehouse, I got below message. I got the similar error message when following "Lakehouse tutorial introduction" when trying to read fact_sale table.…

apache-spark data-lakehouse microsoft-fabric

asked May 27 '23 at 09:06

henjiFire

58
7

0

votes

1 answer

Primary Key in Synapse Serverless SQL Table

How can I create a primary key to an Azure Synapse Serverless SQL Database table? I tried this: CREATE EXTERNAL TABLE [silver].[table] ( [MATNR] char(100) NOT NULL ) WITH ( LOCATION = 'file/tables/bronze_MARA', DATA_SOURCE = dsrc, …

sql-server azure-synapse data-lakehouse

asked Mar 02 '23 at 09:33

marritza

22
5

0

votes

1 answer

Allowing @ symbol in connection string for OPENROWSET from Synapse

I am trying to connect to Synapse SQL database external tables (which access databricks lakehouse tables) from SQL server using openrowset This works: select * from OPENROWSET( 'SQLNCLI', …

sql-server azure-synapse openrowset data-lakehouse

asked Feb 21 '23 at 06:47

marritza

22
5

0

votes

1 answer

Saved delta file reads as an df - is it still part of delta lake?

I have problems understanding the concept of delta lake. Example: I read a parquet file: taxi_df = (spark.read.format("parquet").option("header", "true").load("dbfs:/mnt/randomcontainer/taxirides.parquet")) Then I save it using…

pyspark hive databricks delta-lake data-lakehouse

asked Feb 19 '23 at 09:28

BigMadAndy

153
1
9

0

votes

0 answers

How to resolve 1-n relationship between in star schema?

I'm working on a data storage model for a clickstream analytics system. User action data comes from a third-party system as a set of large JSON files. Currently, we will have an ETL process to read JSON files as a source and save data into our store…

data-warehouse star-schema data-lakehouse

asked Feb 17 '23 at 18:51

Oleksii

294
1
5
12

0

votes

0 answers

Trino not able to create table from JSON file

Trino is not able to create a table from JSON in S3. I use create table trino_test.json_test (id VARCHAR) with (external_location = 's3a://trino_test/jsons/', format='JSON'); but I get Query 20230203_154914_00018_d3erw failed:…

sql json amazon-s3 trino data-lakehouse

asked Feb 03 '23 at 16:31

romanzdk

930
11
30

0

votes

0 answers

How to read delta tables 2.1.0 in S3 bucket that contains symlink_format_manifest by using AWS glue studio 4.0?

I am using Glue Studio 4.0 to choose data source (delta table 2.1.0 that saved in S3) as image below: And then, I generate script from the box: import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions …

amazon-web-services aws-glue delta-lake data-lakehouse

asked Jan 31 '23 at 08:42

Tien Vu

87
7

0

votes

1 answer

How to handle CSV files in the Bronze layer without the extra layer

If my raw data is in CSV format and I would like to store it in the Bronze layer as Delta tables then I would end up with four layers like Raw+Bronze+Silver+Gold. Which approach should I consider?

databricks delta-lake data-lake data-lakehouse

asked Jan 11 '23 at 10:31

Su1tan

45
5

0

votes

1 answer

ETL / ELT pipelines - Metainformation about the pipeline

how do you add metainformation about the used ETL / ELT code (and version of this ELT code) to the produced sink files / tables? Do u consider it as required to have information like "PipelineID" or "DataProductionTime" in the targetfolder?

azure azure-data-factory etl data-warehouse data-lakehouse

asked Nov 27 '22 at 08:35

R. Maier

340
2
13

0

votes

2 answers

Version control of big data tables (iceberg)

I'm building a Iceberg tables on the top of a data lake. These tables are used for reporting tools. I'm trying to figure out what is the best way to control a version/deploy changes to these tables in CI/CD process. E.g. I could like to add a column…

bigdata data-lake iceberg apache-iceberg data-lakehouse

asked Oct 27 '22 at 07:34

Wojtek

1

0

votes

0 answers

Schema & data separation

I was going through an AWS webinar and I found this slide where they recommend separate your data & schema...they mentioned that if we separate out then it's easy for each to be evolved separately but what's the point in data change without schema…

data-lakehouse

asked Oct 11 '22 at 01:06

NK7983

125
1
14

0

votes

1 answer

Managing Schema/Data In Static/Fixed-Content Dimensions with Lakehouse

In the absence of DML (not leveraging Delta Lake as of yet), I'm looking for ways to manage Static/Fixed-Content Dimensions in a Data Lakehouse (i.e. Gender, OrderType, Country). Ideally the schema and data within these dimensions would be managed…

data-warehouse azure-data-lake data-lakehouse

asked Aug 04 '22 at 11:36

Chris Wilson

3
1

-1

votes

0 answers

For data lake storage in AWS S3. What are the advantages of Apache Iceberg over raw parquet Tables?

We are building a data lake and we are storing the data in S3 in parquet format. We are extracting and transforming with Glue. It was proposed that we use Apache Iceberg as table format instead of regular parquet files in partitions. I understand…

amazon-s3 parquet data-lake apache-iceberg data-lakehouse

asked Sep 01 '23 at 02:46

Cristobal Sarome

178
11

Questions tagged [data-lakehouse]