I am using the Lahman Baseball database for a project about comparing different query technologies. I already did a bunch of queries in SQL and then used the SQL workbench to port the tables to XML files. I then uploaded them to an existdb server and I am trying to run queries on it but they are way too slow. I have an xml file called appearances.xml that looks like this. However it has 98,146 ROW elements.
<appearances>
<ROW>
<yearID>1871</yearID>
<teamID>BS1</teamID>
<lgID>NA</lgID>
<masterID>barnero01</masterID>
</ROW>
<ROW>
<yearID>1871</yearID>
<teamID>BS1</teamID>
<lgID>NA</lgID>
<masterID>barrofr01</masterID>
</ROW>
</appearances>
I also have an xml file called master.xml which looks like this which has 18,354 ROW elements.
<master>
<ROW>
<masterID>aardsda01</masterID>
<nameFirst>David</nameFirst>
<nameLast>Aardsma</nameLast>
</ROW>
<ROW>
<masterID>aaronha01</masterID>
<nameFirst>Hank</nameFirst>
<nameLast>Aaron</nameLast>
</ROW>
</master>
I am trying to run an XQY query to join the two tables and find the first and last name of every entry that has the teamID of 'LAN'.
let $laPlayers := for $appearance in /appearances/ROW
let $player := /master/ROW/masterID=$appearance/masterID
where $appearance/teamID eq "LAN"
return fn:concat($player/nameFirst/text(),' ',
$player/nameLast/text())
return $laPlayers
It runs fine but it is soooooo slow! In the appearances table, there are multiple entries for each masterID, because of the yearID (which I don't use here) so I tried calling distinct-values first.
let $laIds = fn:distinct-values(for $appearance in /appearances/ROW
where $appearance/teamID eq "LAN"
return $appearance/masterID/text())
let $laPlayers = for $player in /master/ROW
for $id in $laIds
where $id eq $player/masterID
order by $player/nameLast
return fn:concat($player/nameFirst/text(),' ',
$player/nameLast/text())
return $laPlayers
That made it faster but it is still way slower than it's SQL equivalent. Is there something I can do to make it faster? (here is the SQL equivalent)
SELECT DISTINCT m.nameFirst, m.nameLast FROM appearances a, master m
WHERE a.teamID = "LAN" AND a.masterID = m.masterID ORDER BY m.nameLast, m.nameFirst;