Questions tagged [data-lakehouse]

28 questions
0
votes
3 answers

Create database for fabric_lakehouse is not permitted using Apache Spark in Microsoft Fabric

I followed the instruction Use delta tables in Apache Spark but when I try to save the tables into lakehouse, I got below message. I got the similar error message when following "Lakehouse tutorial introduction" when trying to read fact_sale table.…
henjiFire
  • 58
  • 7
0
votes
1 answer

Primary Key in Synapse Serverless SQL Table

How can I create a primary key to an Azure Synapse Serverless SQL Database table? I tried this: CREATE EXTERNAL TABLE [silver].[table] ( [MATNR] char(100) NOT NULL ) WITH ( LOCATION = 'file/tables/bronze_MARA', DATA_SOURCE = dsrc, …
marritza
  • 22
  • 5
0
votes
1 answer

Allowing @ symbol in connection string for OPENROWSET from Synapse

I am trying to connect to Synapse SQL database external tables (which access databricks lakehouse tables) from SQL server using openrowset This works: select * from OPENROWSET( 'SQLNCLI', …
0
votes
1 answer

Saved delta file reads as an df - is it still part of delta lake?

I have problems understanding the concept of delta lake. Example: I read a parquet file: taxi_df = (spark.read.format("parquet").option("header", "true").load("dbfs:/mnt/randomcontainer/taxirides.parquet")) Then I save it using…
BigMadAndy
  • 153
  • 1
  • 9
0
votes
0 answers

How to resolve 1-n relationship between in star schema?

I'm working on a data storage model for a clickstream analytics system. User action data comes from a third-party system as a set of large JSON files. Currently, we will have an ETL process to read JSON files as a source and save data into our store…
Oleksii
  • 294
  • 1
  • 5
  • 12
0
votes
0 answers

Trino not able to create table from JSON file

Trino is not able to create a table from JSON in S3. I use create table trino_test.json_test (id VARCHAR) with (external_location = 's3a://trino_test/jsons/', format='JSON'); but I get Query 20230203_154914_00018_d3erw failed:…
romanzdk
  • 930
  • 11
  • 30
0
votes
0 answers

How to read delta tables 2.1.0 in S3 bucket that contains symlink_format_manifest by using AWS glue studio 4.0?

I am using Glue Studio 4.0 to choose data source (delta table 2.1.0 that saved in S3) as image below: And then, I generate script from the box: import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions …
0
votes
1 answer

How to handle CSV files in the Bronze layer without the extra layer

If my raw data is in CSV format and I would like to store it in the Bronze layer as Delta tables then I would end up with four layers like Raw+Bronze+Silver+Gold. Which approach should I consider?
Su1tan
  • 45
  • 5
0
votes
1 answer

ETL / ELT pipelines - Metainformation about the pipeline

how do you add metainformation about the used ETL / ELT code (and version of this ELT code) to the produced sink files / tables? Do u consider it as required to have information like "PipelineID" or "DataProductionTime" in the targetfolder?
0
votes
2 answers

Version control of big data tables (iceberg)

I'm building a Iceberg tables on the top of a data lake. These tables are used for reporting tools. I'm trying to figure out what is the best way to control a version/deploy changes to these tables in CI/CD process. E.g. I could like to add a column…
0
votes
0 answers

Schema & data separation

I was going through an AWS webinar and I found this slide where they recommend separate your data & schema...they mentioned that if we separate out then it's easy for each to be evolved separately but what's the point in data change without schema…
NK7983
  • 125
  • 1
  • 14
0
votes
1 answer

Managing Schema/Data In Static/Fixed-Content Dimensions with Lakehouse

In the absence of DML (not leveraging Delta Lake as of yet), I'm looking for ways to manage Static/Fixed-Content Dimensions in a Data Lakehouse (i.e. Gender, OrderType, Country). Ideally the schema and data within these dimensions would be managed…
-1
votes
0 answers

For data lake storage in AWS S3. What are the advantages of Apache Iceberg over raw parquet Tables?

We are building a data lake and we are storing the data in S3 in parquet format. We are extracting and transforming with Glue. It was proposed that we use Apache Iceberg as table format instead of regular parquet files in partitions. I understand…
1
2