1

I have set up Auto Loader to regularly read json files and store them in a "bronze" table called fixture_raw using Delta Live Tables in Databricks. This works fine and the json data is stored in the specified table, but when I add a "silver" table called fixture_prepared and try to extract some of the json elements from the bronze table, I get an error:

org.apache.spark.sql.AnalysisException: Ambiguous reference to fields StructField(id,LongType,true), StructField(id,LongType,true)

How can I get around this?

Delta Live Table code:

CREATE OR REFRESH STREAMING LIVE TABLE fixture_raw AS 
SELECT *, input_file_name() AS InputFile, now() AS LoadTime FROM cloud_files(
  "/mnt/input/fixtures/", 
  "json",
  map(
    "cloudFiles.inferColumnTypes", "true",
    "cloudFiles.schemaLocation", "/mnt/dlt/schema/fixture",
    "cloudFiles.schemaEvolutionMode", "addNewColumns"
  )
);

CREATE OR REFRESH LIVE TABLE fixture_prepared AS
WITH FixtureData (
  SELECT 
    explode(response) AS FixtureJson
  FROM live.fixture_raw
)
SELECT
  FixtureJson.fixture.id AS FixtureID,
  FixtureJson.fixture.date AS StartTime,
  FixtureJson.fixture.venue.name AS Venue,
  FixtureJson.teams.home.id AS HomeTeamID,
  FixtureJson.teams.home.name AS HomeTeamName,
  FixtureJson.teams.away.id AS AwayTeamID,
  FixtureJson.teams.away.name AS AwayTeamName
FROM FixtureData;

Json data:

{
    "get": "fixtures",
    "parameters": {
        "league": "39",
        "season": "2022"
    },
    "response": [
        {
            "fixture": {
                "id": 867946,
                "date": "2022-08-05T19:00:00+00:00",
                "venue": {
                    "id": 525,
                    "name": "Selhurst Park"
                }
            },
            "teams": {
                "home": {
                    "id": 52,
                    "name": "Crystal Palace"
                },
                "away": {
                    "id": 42,
                    "name": "Arsenal"
                }
            }
        },
        {
            "fixture": {
                "id": 867947,
                "date": "2022-08-06T11:30:00+00:00",
                "venue": {
                    "id": 535,
                    "name": "Craven Cottage"
                }
            },
            "teams": {
                "home": {
                    "id": 36,
                    "name": "Fulham"
                },
                "away": {
                    "id": 40,
                    "name": "Liverpool"
                }
            }
        }
    ]
}

1 Answers1

0

There is a difference between assigning the size of the data frame and calling the dataframe. Kindly check the assigning the dataframe size and calling the dataframe before joining. Kindly go through the official documentation. I followed the same scenario with the sample code in my environment. I added a silver table it's working fine for me without error. Follow this GitHub reference it has detailed information.

Reference:

https://learn.microsoft.com/en-us/azure/databricks/data-engineering/delta-live-tables/delta-live-tables-quickstart#sql

Delta Live Tables Demo: Modern software engineering for ETL processing.

B. B. Naga Sai Vamsi
  • 2,386
  • 2
  • 3
  • 11
  • Thanks! I followed the Youtube video and first exploded the json data from the bronze table into a temporary table, and then flattened the json data into another temporary table. Then I got rid of the ambiguity errors when loading the data into my silver table. – Magnus Johannesson Jul 03 '22 at 13:53