Questions tagged [data-lakehouse]
28 questions
0
votes
3 answers
Create database for fabric_lakehouse is not permitted using Apache Spark in Microsoft Fabric
I followed the instruction Use delta tables in Apache Spark
but when I try to save the tables into lakehouse, I got below message. I got the similar error message when following "Lakehouse tutorial introduction" when trying to read fact_sale table.…

henjiFire
- 58
- 7
0
votes
1 answer
Primary Key in Synapse Serverless SQL Table
How can I create a primary key to an Azure Synapse Serverless SQL Database table?
I tried this:
CREATE EXTERNAL TABLE [silver].[table]
(
[MATNR] char(100) NOT NULL
)
WITH
(
LOCATION = 'file/tables/bronze_MARA',
DATA_SOURCE = dsrc,
…

marritza
- 22
- 5
0
votes
1 answer
Allowing @ symbol in connection string for OPENROWSET from Synapse
I am trying to connect to Synapse SQL database external tables (which access databricks lakehouse tables) from SQL server using openrowset
This works:
select * from
OPENROWSET(
'SQLNCLI',
…

marritza
- 22
- 5
0
votes
1 answer
Saved delta file reads as an df - is it still part of delta lake?
I have problems understanding the concept of delta lake. Example:
I read a parquet file:
taxi_df = (spark.read.format("parquet").option("header", "true").load("dbfs:/mnt/randomcontainer/taxirides.parquet"))
Then I save it using…

BigMadAndy
- 153
- 1
- 9
0
votes
0 answers
How to resolve 1-n relationship between in star schema?
I'm working on a data storage model for a clickstream analytics system. User action data comes from a third-party system as a set of large JSON files. Currently, we will have an ETL process to read JSON files as a source and save data into our store…

Oleksii
- 294
- 1
- 5
- 12
0
votes
0 answers
Trino not able to create table from JSON file
Trino is not able to create a table from JSON in S3.
I use
create table trino_test.json_test (id VARCHAR) with (external_location = 's3a://trino_test/jsons/', format='JSON');
but I get Query 20230203_154914_00018_d3erw failed:…

romanzdk
- 930
- 11
- 30
0
votes
0 answers
How to read delta tables 2.1.0 in S3 bucket that contains symlink_format_manifest by using AWS glue studio 4.0?
I am using Glue Studio 4.0 to choose data source (delta table 2.1.0 that saved in S3) as image below:
And then, I generate script from the box:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
…

Tien Vu
- 87
- 7
0
votes
1 answer
How to handle CSV files in the Bronze layer without the extra layer
If my raw data is in CSV format and I would like to store it in the Bronze layer as Delta tables then I would end up with four layers like Raw+Bronze+Silver+Gold. Which approach should I consider?

Su1tan
- 45
- 5
0
votes
1 answer
ETL / ELT pipelines - Metainformation about the pipeline
how do you add metainformation about the used ETL / ELT code (and version of this ELT code) to the produced sink files / tables?
Do u consider it as required to have information like "PipelineID" or "DataProductionTime" in the targetfolder?

R. Maier
- 340
- 2
- 13
0
votes
2 answers
Version control of big data tables (iceberg)
I'm building a Iceberg tables on the top of a data lake. These tables are used for reporting tools. I'm trying to figure out what is the best way to control a version/deploy changes to these tables in CI/CD process. E.g. I could like to add a column…

Wojtek
- 1
0
votes
0 answers
Schema & data separation
I was going through an AWS webinar and I found this slide where they recommend separate your data & schema...they mentioned that if we separate out then it's easy for each to be evolved separately but what's the point in data change without schema…

NK7983
- 125
- 1
- 14
0
votes
1 answer
Managing Schema/Data In Static/Fixed-Content Dimensions with Lakehouse
In the absence of DML (not leveraging Delta Lake as of yet), I'm looking for ways to manage Static/Fixed-Content Dimensions in a Data Lakehouse (i.e. Gender, OrderType, Country).
Ideally the schema and data within these dimensions would be managed…

Chris Wilson
- 3
- 1
-1
votes
0 answers
For data lake storage in AWS S3. What are the advantages of Apache Iceberg over raw parquet Tables?
We are building a data lake and we are storing the data in S3 in parquet format. We are extracting and transforming with Glue. It was proposed that we use Apache Iceberg as table format instead of regular parquet files in partitions.
I understand…

Cristobal Sarome
- 178
- 11