2

I am trying to get a large set of nested JSON files to load into a table, each file is a single record and there are ~25k files. However when I try to declare the schema it errors out when trying to declare the data type if it is a struct. For reference, I was reading this article from DuckDB as well https://duckdb.org/2023/03/03/json.html.

The Draft, MatchAwards, Players, and TeamPeriodicXPBreakdown are all arrays of dictionaries. Players has a another array called "Talents" that is also an array of dictionaries. The following query will work, but only if I declare those columns as VARCHAR. This is not inherently an issue because I will need to UNNEST them anyways, however even then I am not able to extract the JSON and use it because of the error.

select *
from read_json(
    'D:\parsed_replays\*'
    , columns={
        RandomValue: 'VARCHAR'
        , ReplayFingerPrint: 'VARCHAR'
        , BattlegroundName: 'VARCHAR'
        , ShortName: 'VARCHAR'
        , WinningTeam: 'UBIGINT'
        , FirstToTen: 'BIGINT'
        , DraftFirstTeam: 'UBIGINT'
        , GameType: 'VARCHAR'
        , GameLength: 'VARCHAR'
        , GameLengthTimestamp: 'UBIGINT'
        , GameDate: 'VARCHAR'
        , GameDateFormatted: 'VARCHAR'
        , VersionBuild: 'VARCHAR'
        , VersionMajor: 'VARCHAR'
        , Version: 'VARCHAR'
        , Draft: 'VARCHAR'
        , Players: 'VARCHAR'
        , MatchAwards: 'VARCHAR'
        , TeamPeriodicXPBreakdown: 'VARCHAR'
        , ReplayFileName: 'VARCHAR'
        , ReplayFileNameFormatted: 'VARCHAR'
        , ReplayFilePath: 'VARCHAR'
    }
--     , json_format='auto'
    )

This query will also work to return the json.

         select *
         FROM read_json_objects('D:\parsed_replays\*.json')
         limit 100

Now an example of a draft looks like this. This will fail with the error Unsupported result column type STRUCT("Draft" STRUCT("ReplayFingerPrint" VARCHAR, "DraftIndex" UBIGINT, "Team" UBIGINT, "Battletag" VARCHAR, "AltName" VARCHAR, "SelectedPlayerSlotId" UBIGINT, "PickType" VARCHAR)[]).

    SELECT from_json(
        '{
            "Draft": [
            {
              "ReplayFingerPrint": "eba25103-6b64-96de-cab8-8baee0aa349a",
              "DraftIndex": 0,
              "Team": 1,
              "Battletag": null,
              "AltName": "FaerieDragon",
              "SelectedPlayerSlotId": 2,
              "PickType": "Banned"
            }]}'
        , '{"Draft":[{"ReplayFingerPrint":"VARCHAR","DraftIndex":"UBIGINT","Team":"UBIGINT","Battletag":"VARCHAR","AltName":"VARCHAR","SelectedPlayerSlotId":"UBIGINT","PickType":"VARCHAR"}]}'
        )

I can't seem to figure out what I am missing in order to do this, but it seems like this should work. Essentially I want to flatten the original document, and break it into a few tables to work on, Replay, ReplayPlayers, PlayerTalents, TeamExp, Draft, and MatchAwards.

This is the structure query from json_structure.

STRUCT(
  "RandomValue" VARCHAR
  , "ReplayFingerPrint" VARCHAR
  , "BattlegroundName" VARCHAR
  , "ShortName" VARCHAR
  , "WinningTeam" UBIGINT
  , "FirstToTen" BIGINT
  , "DraftFirstTeam" UBIGINT
  , "GameType" VARCHAR
  , "GameLength" VARCHAR
  , "GameLengthTimestamp" UBIGINT
  , "GameDate" VARCHAR
  , "GameDateFormatted" VARCHAR
  , "VersionBuild" VARCHAR
  , "VersionMajor" VARCHAR
  , "Version" VARCHAR
  , "Draft" STRUCT("ReplayFingerPrint" VARCHAR, "DraftIndex" UBIGINT, "Team" UBIGINT, "Battletag" VARCHAR, "AltName" VARCHAR, "SelectedPlayerSlotId" UBIGINT, "PickType" VARCHAR)[]
  , "Players" STRUCT("ReplayFingerPrint" VARCHAR, "Battletag" VARCHAR, "HeroId" VARCHAR, "AttributeId" VARCHAR, "Team" UBIGINT, "Party" BIGINT, "IsWinner" BOOLEAN, "Character" VARCHAR, "CharacterLevel" UBIGINT, "AccountLevel" UBIGINT, "FirstToTen" BOOLEAN, "PlayerType" VARCHAR, "Region" UBIGINT, "BlizzardId" UBIGINT, "Level" UBIGINT, "Takedowns" UBIGINT, "SoloKills" UBIGINT, "Assists" UBIGINT, "Deaths" UBIGINT, "HeroDamage" UBIGINT, "SiegeDamage" UBIGINT, "StructureDamage" UBIGINT, "MinionDamage" UBIGINT, "CreepDamage" UBIGINT, "SummonDamage" UBIGINT, "TimeCCdEnemyHeroes" UBIGINT, "Healing" UBIGINT, "SelfHealing" UBIGINT, "RegenGlobes" UBIGINT, "DamageTaken" UBIGINT, "DamageSoaked" UBIGINT, "ExperienceContribution" UBIGINT, "TownKills" UBIGINT, "TimeSpentDead" UBIGINT, "MercCampCaptures" UBIGINT, "WatchTowerCaptures" UBIGINT, "MetaExperience" UBIGINT, "HighestKillStreak" UBIGINT, "ProtectionGivenToAllies" UBIGINT, "TimeSilencingEnemyHeroes" UBIGINT, "TimeRootingEnemyHeroes" UBIGINT, "TimeStunningEnemyHeroes" UBIGINT, "ClutchHealsPerformed" UBIGINT, "EscapesPerformed" UBIGINT, "VengeancesPerformed" UBIGINT, "OutnumberedDeaths" UBIGINT, "TeamfightEscapesPerformed" UBIGINT, "TeamfightHealingDone" UBIGINT, "TeamfightDamageTaken" UBIGINT, "TeamfightHeroDamage" UBIGINT, "Multikill" UBIGINT, "PhysicalDamage" UBIGINT, "SpellDamage" UBIGINT, "OnFireTimeonFire" UBIGINT
            , "Talents" STRUCT("ReplayFingerPrint" VARCHAR, "Battletag" VARCHAR, "TalentIndex" UBIGINT, "TalentTier" VARCHAR, "TalentId" UBIGINT, "TalentName" VARCHAR, "TimeSpanSelectedString" VARCHAR, "TimeSpanSelected" UBIGINT)[])[]
  , "MatchAwards" STRUCT("ReplayFingerPrint" VARCHAR, "Battletag" VARCHAR, "BlizzardId" UBIGINT, "Award" VARCHAR)[]
  , "TeamPeriodicXPBreakdown" STRUCT("ReplayFingerPrint" VARCHAR, "Team" UBIGINT, "TeamLevel" UBIGINT, "TimeSpan" UBIGINT, "MinionXP" UBIGINT, "CreepXP" UBIGINT, "StructureXP" UBIGINT, "HeroXP" UBIGINT, "TrickleXP" UBIGINT, "TotalXP" UBIGINT)[]
  , "ReplayFileName" VARCHAR
  , "ReplayFileNameFormatted" VARCHAR
  , "ReplayFilePath" VARCHAR
  )

And the full columns dict that I am passing where it declares the children.


 columns ={
    RandomValue: 'VARCHAR'
  , ReplayFingerPrint: 'VARCHAR'
  , BattlegroundName: 'VARCHAR'
  , ShortName: 'VARCHAR'
  , WinningTeam: 'UBIGINT'
  , FirstToTen: 'BIGINT'
  , DraftFirstTeam: 'UBIGINT'
  , GameType: 'VARCHAR'
  , GameLength: 'VARCHAR'
  , GameLengthTimestamp: 'UBIGINT'
  , GameDate: 'VARCHAR'
  , GameDateFormatted: 'VARCHAR'
  , VersionBuild: 'VARCHAR'
  , VersionMajor: 'VARCHAR'
  , Version: 'VARCHAR'
  , Draft: 'STRUCT(
      ReplayFingerPrint VARCHAR
      , DraftIndex UBIGINT
      , Team UBIGINT
      , Battletag VARCHAR
      , AltName VARCHAR
      , SelectedPlayerSlotId UBIGINT
      , PickType VARCHAR
      )[]'
  , Players: 'STRUCT(
      ReplayFingerPrint VARCHAR
      , Battletag VARCHAR
      , HeroId VARCHAR
      , AttributeId VARCHAR
      , Team UBIGINT
      , Party BIGINT
      , IsWinner BOOLEAN
      , Character VARCHAR
      , CharacterLevel UBIGINT
      , AccountLevel UBIGINT
      , FirstToTen BOOLEAN
      , PlayerType VARCHAR
      , Region UBIGINT
      , BlizzardId UBIGINT
      , Level UBIGINT
      , Takedowns UBIGINT
      , SoloKills UBIGINT
      , Assists UBIGINT
      , Deaths UBIGINT
      , HeroDamage UBIGINT
      , SiegeDamage UBIGINT, StructureDamage UBIGINT, MinionDamage UBIGINT, CreepDamage UBIGINT, SummonDamage UBIGINT, TimeCCdEnemyHeroes UBIGINT
      , Healing UBIGINT, SelfHealing UBIGINT, RegenGlobes UBIGINT, DamageTaken UBIGINT, DamageSoaked UBIGINT, ExperienceContribution UBIGINT
      , TownKills UBIGINT, TimeSpentDead UBIGINT, MercCampCaptures UBIGINT, WatchTowerCaptures UBIGINT, MetaExperience UBIGINT, HighestKillStreak UBIGINT
      , ProtectionGivenToAllies UBIGINT, TimeSilencingEnemyHeroes UBIGINT, TimeRootingEnemyHeroes UBIGINT, TimeStunningEnemyHeroes UBIGINT
      , ClutchHealsPerformed UBIGINT, EscapesPerformed UBIGINT, VengeancesPerformed UBIGINT, OutnumberedDeaths UBIGINT, TeamfightEscapesPerformed UBIGINT
      , TeamfightHealingDone UBIGINT, TeamfightDamageTaken UBIGINT, TeamfightHeroDamage UBIGINT, Multikill UBIGINT
      , PhysicalDamage UBIGINT, SpellDamage UBIGINT, OnFireTimeonFire UBIGINT
      , Talents STRUCT(
        ReplayFingerPrint VARCHAR
        , Battletag VARCHAR
        , TalentIndex UBIGINT
        , TalentTier VARCHAR
        , TalentId UBIGINT
        , TalentName VARCHAR
        , TimeSpanSelectedString VARCHAR
        , TimeSpanSelected UBIGINT
        )[]
        )[]'
  , MatchAwards: 'STRUCT(ReplayFingerPrint VARCHAR, Battletag VARCHAR, BlizzardId UBIGINT, Award VARCHAR)[]'
  , TeamPeriodicXPBreakdown: 'STRUCT(ReplayFingerPrint VARCHAR, Team UBIGINT, TeamLevel UBIGINT, TimeSpan UBIGINT, MinionXP UBIGINT, CreepXP UBIGINT, StructureXP UBIGINT, HeroXP UBIGINT, TrickleXP UBIGINT, TotalXP UBIGINT)[]'
  , ReplayFileName: 'VARCHAR'
  , ReplayFileNameFormatted: 'VARCHAR'
  , ReplayFilePath: 'VARCHAR'
 }
Mitchell Hamann
  • 313
  • 4
  • 18

0 Answers0