I am trying to get a large set of nested JSON files to load into a table, each file is a single record and there are ~25k files. However when I try to declare the schema it errors out when trying to declare the data type if it is a struct. For reference, I was reading this article from DuckDB as well https://duckdb.org/2023/03/03/json.html.
The Draft, MatchAwards, Players, and TeamPeriodicXPBreakdown are all arrays of dictionaries. Players has a another array called "Talents" that is also an array of dictionaries. The following query will work, but only if I declare those columns as VARCHAR. This is not inherently an issue because I will need to UNNEST them anyways, however even then I am not able to extract the JSON and use it because of the error.
select *
from read_json(
'D:\parsed_replays\*'
, columns={
RandomValue: 'VARCHAR'
, ReplayFingerPrint: 'VARCHAR'
, BattlegroundName: 'VARCHAR'
, ShortName: 'VARCHAR'
, WinningTeam: 'UBIGINT'
, FirstToTen: 'BIGINT'
, DraftFirstTeam: 'UBIGINT'
, GameType: 'VARCHAR'
, GameLength: 'VARCHAR'
, GameLengthTimestamp: 'UBIGINT'
, GameDate: 'VARCHAR'
, GameDateFormatted: 'VARCHAR'
, VersionBuild: 'VARCHAR'
, VersionMajor: 'VARCHAR'
, Version: 'VARCHAR'
, Draft: 'VARCHAR'
, Players: 'VARCHAR'
, MatchAwards: 'VARCHAR'
, TeamPeriodicXPBreakdown: 'VARCHAR'
, ReplayFileName: 'VARCHAR'
, ReplayFileNameFormatted: 'VARCHAR'
, ReplayFilePath: 'VARCHAR'
}
-- , json_format='auto'
)
This query will also work to return the json.
select *
FROM read_json_objects('D:\parsed_replays\*.json')
limit 100
Now an example of a draft looks like this. This will fail with the error Unsupported result column type STRUCT("Draft" STRUCT("ReplayFingerPrint" VARCHAR, "DraftIndex" UBIGINT, "Team" UBIGINT, "Battletag" VARCHAR, "AltName" VARCHAR, "SelectedPlayerSlotId" UBIGINT, "PickType" VARCHAR)[])
.
SELECT from_json(
'{
"Draft": [
{
"ReplayFingerPrint": "eba25103-6b64-96de-cab8-8baee0aa349a",
"DraftIndex": 0,
"Team": 1,
"Battletag": null,
"AltName": "FaerieDragon",
"SelectedPlayerSlotId": 2,
"PickType": "Banned"
}]}'
, '{"Draft":[{"ReplayFingerPrint":"VARCHAR","DraftIndex":"UBIGINT","Team":"UBIGINT","Battletag":"VARCHAR","AltName":"VARCHAR","SelectedPlayerSlotId":"UBIGINT","PickType":"VARCHAR"}]}'
)
I can't seem to figure out what I am missing in order to do this, but it seems like this should work. Essentially I want to flatten the original document, and break it into a few tables to work on, Replay, ReplayPlayers, PlayerTalents, TeamExp, Draft, and MatchAwards.
This is the structure query from json_structure
.
STRUCT(
"RandomValue" VARCHAR
, "ReplayFingerPrint" VARCHAR
, "BattlegroundName" VARCHAR
, "ShortName" VARCHAR
, "WinningTeam" UBIGINT
, "FirstToTen" BIGINT
, "DraftFirstTeam" UBIGINT
, "GameType" VARCHAR
, "GameLength" VARCHAR
, "GameLengthTimestamp" UBIGINT
, "GameDate" VARCHAR
, "GameDateFormatted" VARCHAR
, "VersionBuild" VARCHAR
, "VersionMajor" VARCHAR
, "Version" VARCHAR
, "Draft" STRUCT("ReplayFingerPrint" VARCHAR, "DraftIndex" UBIGINT, "Team" UBIGINT, "Battletag" VARCHAR, "AltName" VARCHAR, "SelectedPlayerSlotId" UBIGINT, "PickType" VARCHAR)[]
, "Players" STRUCT("ReplayFingerPrint" VARCHAR, "Battletag" VARCHAR, "HeroId" VARCHAR, "AttributeId" VARCHAR, "Team" UBIGINT, "Party" BIGINT, "IsWinner" BOOLEAN, "Character" VARCHAR, "CharacterLevel" UBIGINT, "AccountLevel" UBIGINT, "FirstToTen" BOOLEAN, "PlayerType" VARCHAR, "Region" UBIGINT, "BlizzardId" UBIGINT, "Level" UBIGINT, "Takedowns" UBIGINT, "SoloKills" UBIGINT, "Assists" UBIGINT, "Deaths" UBIGINT, "HeroDamage" UBIGINT, "SiegeDamage" UBIGINT, "StructureDamage" UBIGINT, "MinionDamage" UBIGINT, "CreepDamage" UBIGINT, "SummonDamage" UBIGINT, "TimeCCdEnemyHeroes" UBIGINT, "Healing" UBIGINT, "SelfHealing" UBIGINT, "RegenGlobes" UBIGINT, "DamageTaken" UBIGINT, "DamageSoaked" UBIGINT, "ExperienceContribution" UBIGINT, "TownKills" UBIGINT, "TimeSpentDead" UBIGINT, "MercCampCaptures" UBIGINT, "WatchTowerCaptures" UBIGINT, "MetaExperience" UBIGINT, "HighestKillStreak" UBIGINT, "ProtectionGivenToAllies" UBIGINT, "TimeSilencingEnemyHeroes" UBIGINT, "TimeRootingEnemyHeroes" UBIGINT, "TimeStunningEnemyHeroes" UBIGINT, "ClutchHealsPerformed" UBIGINT, "EscapesPerformed" UBIGINT, "VengeancesPerformed" UBIGINT, "OutnumberedDeaths" UBIGINT, "TeamfightEscapesPerformed" UBIGINT, "TeamfightHealingDone" UBIGINT, "TeamfightDamageTaken" UBIGINT, "TeamfightHeroDamage" UBIGINT, "Multikill" UBIGINT, "PhysicalDamage" UBIGINT, "SpellDamage" UBIGINT, "OnFireTimeonFire" UBIGINT
, "Talents" STRUCT("ReplayFingerPrint" VARCHAR, "Battletag" VARCHAR, "TalentIndex" UBIGINT, "TalentTier" VARCHAR, "TalentId" UBIGINT, "TalentName" VARCHAR, "TimeSpanSelectedString" VARCHAR, "TimeSpanSelected" UBIGINT)[])[]
, "MatchAwards" STRUCT("ReplayFingerPrint" VARCHAR, "Battletag" VARCHAR, "BlizzardId" UBIGINT, "Award" VARCHAR)[]
, "TeamPeriodicXPBreakdown" STRUCT("ReplayFingerPrint" VARCHAR, "Team" UBIGINT, "TeamLevel" UBIGINT, "TimeSpan" UBIGINT, "MinionXP" UBIGINT, "CreepXP" UBIGINT, "StructureXP" UBIGINT, "HeroXP" UBIGINT, "TrickleXP" UBIGINT, "TotalXP" UBIGINT)[]
, "ReplayFileName" VARCHAR
, "ReplayFileNameFormatted" VARCHAR
, "ReplayFilePath" VARCHAR
)
And the full columns dict that I am passing where it declares the children.
columns ={
RandomValue: 'VARCHAR'
, ReplayFingerPrint: 'VARCHAR'
, BattlegroundName: 'VARCHAR'
, ShortName: 'VARCHAR'
, WinningTeam: 'UBIGINT'
, FirstToTen: 'BIGINT'
, DraftFirstTeam: 'UBIGINT'
, GameType: 'VARCHAR'
, GameLength: 'VARCHAR'
, GameLengthTimestamp: 'UBIGINT'
, GameDate: 'VARCHAR'
, GameDateFormatted: 'VARCHAR'
, VersionBuild: 'VARCHAR'
, VersionMajor: 'VARCHAR'
, Version: 'VARCHAR'
, Draft: 'STRUCT(
ReplayFingerPrint VARCHAR
, DraftIndex UBIGINT
, Team UBIGINT
, Battletag VARCHAR
, AltName VARCHAR
, SelectedPlayerSlotId UBIGINT
, PickType VARCHAR
)[]'
, Players: 'STRUCT(
ReplayFingerPrint VARCHAR
, Battletag VARCHAR
, HeroId VARCHAR
, AttributeId VARCHAR
, Team UBIGINT
, Party BIGINT
, IsWinner BOOLEAN
, Character VARCHAR
, CharacterLevel UBIGINT
, AccountLevel UBIGINT
, FirstToTen BOOLEAN
, PlayerType VARCHAR
, Region UBIGINT
, BlizzardId UBIGINT
, Level UBIGINT
, Takedowns UBIGINT
, SoloKills UBIGINT
, Assists UBIGINT
, Deaths UBIGINT
, HeroDamage UBIGINT
, SiegeDamage UBIGINT, StructureDamage UBIGINT, MinionDamage UBIGINT, CreepDamage UBIGINT, SummonDamage UBIGINT, TimeCCdEnemyHeroes UBIGINT
, Healing UBIGINT, SelfHealing UBIGINT, RegenGlobes UBIGINT, DamageTaken UBIGINT, DamageSoaked UBIGINT, ExperienceContribution UBIGINT
, TownKills UBIGINT, TimeSpentDead UBIGINT, MercCampCaptures UBIGINT, WatchTowerCaptures UBIGINT, MetaExperience UBIGINT, HighestKillStreak UBIGINT
, ProtectionGivenToAllies UBIGINT, TimeSilencingEnemyHeroes UBIGINT, TimeRootingEnemyHeroes UBIGINT, TimeStunningEnemyHeroes UBIGINT
, ClutchHealsPerformed UBIGINT, EscapesPerformed UBIGINT, VengeancesPerformed UBIGINT, OutnumberedDeaths UBIGINT, TeamfightEscapesPerformed UBIGINT
, TeamfightHealingDone UBIGINT, TeamfightDamageTaken UBIGINT, TeamfightHeroDamage UBIGINT, Multikill UBIGINT
, PhysicalDamage UBIGINT, SpellDamage UBIGINT, OnFireTimeonFire UBIGINT
, Talents STRUCT(
ReplayFingerPrint VARCHAR
, Battletag VARCHAR
, TalentIndex UBIGINT
, TalentTier VARCHAR
, TalentId UBIGINT
, TalentName VARCHAR
, TimeSpanSelectedString VARCHAR
, TimeSpanSelected UBIGINT
)[]
)[]'
, MatchAwards: 'STRUCT(ReplayFingerPrint VARCHAR, Battletag VARCHAR, BlizzardId UBIGINT, Award VARCHAR)[]'
, TeamPeriodicXPBreakdown: 'STRUCT(ReplayFingerPrint VARCHAR, Team UBIGINT, TeamLevel UBIGINT, TimeSpan UBIGINT, MinionXP UBIGINT, CreepXP UBIGINT, StructureXP UBIGINT, HeroXP UBIGINT, TrickleXP UBIGINT, TotalXP UBIGINT)[]'
, ReplayFileName: 'VARCHAR'
, ReplayFileNameFormatted: 'VARCHAR'
, ReplayFilePath: 'VARCHAR'
}