I have been playing with an e-commerce sample database I migrated from SQL Server to GraphDB. I have been translating SQL queries to SPARQL and have a few of them already. However, I'm finding difficult to do things like calculating running totals, moving averages, row numbers and ranking, etc. Before I explain the two subqueries below, here is a snippet of the model:
product -> hasCategory -> category
order -> hasCustomer -> customer
The first query is working fine. However, I need to find an alternative for the second one, where I need to return the two most recent orders for each customers in the outer query. In SQL, one of the solutions would be to use a correlated query (pass customerID to the inner query for each record of the outer query).
I have notice a few "issues/proposals" to SPARQL, logged below:
I see you guys have been involved in some of them.
# Query 1: Select all products that belong to the Seafood category
# WORKS FINE, as the inner query is completely independent of the outer one.
SELECT
?productName
?unitPrice
?unitsInStock
WHERE { # outer query
?product a :product ;
:productName ?productName ;
:unitPrice ?unitPrice ;
:unitsInStock ?unitsInStock ;
:hasCategory ?category .
{ # inner query
SELECT
?category
WHERE {
?category a :category ;
:categoryID ?categoryID ;
:name "Seafood" .
}
}
}
ORDER BY
?productName
# Query 2: Select the 3 most recent orders of each customer
# DOES NOT WORK!! It's a correlated query. SPARQL engine is not able to join the queries together.
SELECT DISTINCT
?customerID
?city
WHERE { # outer query
?customer a :customer ;
:customerID ?customerID ;
:city ?city ;
^:hasCustomer ?order .
{ # inner query
SELECT
?order
WHERE {
?order a :order ;
:orderID ?orderID ;
:orderDate ?orderDate ;
:hasCustomer ?customer .
}
ORDER BY
DESC(?orderDate)
LIMIT 3
}
}
ORDER BY
?customerID
?city
DESC(?orderDate)
I could have used OVER PARTITION BY to replace the subquery above, with the same results in SQL. Actually, there are many ways of doing this in SQL. But, I couldn't find any in SPARQL. Do you guys know a workaround for this?
Thanks very much,
Marcelo.
PS: I'm also leaving the SQL Queries here for reference (same use cases resolved in different ways).
-- Query: Select all products that belong to the Seafood category
SELECT
prd.ProductName,
prd.UnitPrice,
prd.UnitsInStock
FROM
Product prd
WHERE
prd.CategoryID IN (
SELECT ctg.CategoryID FROM Category ctg WHERE ctg.CategoryName = 'Seafood')
ORDER BY
prd.ProductName
-- Query: Select all products that belong to the Seafood category
-- This query replaces the previous one by using the more efficient EXISTS
SELECT
prd.ProductName,
prd.UnitPrice,
prd.UnitsInStock
FROM
Product prd
WHERE
EXISTS (
SELECT 1 FROM Category ctg WHERE prd.CategoryID = ctg.CategoryID AND ctg.CategoryName = 'Seafood')
ORDER BY
prd.ProductName
-- Query: Select all products that belong to the Seafood category
-- Re-writting the previous query using JOIN
SELECT
prd.ProductName,
prd.UnitPrice,
prd.UnitsInStock
FROM
Product prd
INNER JOIN Category ctg
ON prd.CategoryID = ctg.CategoryID
WHERE ctg.CategoryName = 'Seafood'
ORDER BY
prd.ProductName
-- Query: Select the 3 most recent orders of each customer
SELECT
cst.CustomerID,
cst.City,
cpp.OrderID,
cpp.OrderDate
FROM
Customer AS cst
-- For each customer record, go and get the two most recent orders.
CROSS APPLY -- An INNER JOIN could've been used, however, CROSS APPLY is more efficient when combined with SELECT TOP.
(
SELECT TOP 3
ord.OrderID, ord.OrderDate, cst.CustomerID
FROM
[Order] AS ord
WHERE
ord.customerid = cst.customerid -- reference to the outer query (correlated subquery)
ORDER BY
ord.OrderDate DESC
) AS cpp
ORDER BY
cst.CustomerID,
cst.City,
cpp.OrderDate DESC
-- Windowed Functions
-- Calculating row numbering and ranking, quantiles, moving averages, and running totals.
-- Reference: https://learn.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-2017
-- Query: Select the 3 most recent orders of each customer
-- This query replaces the previous one by using the more efficient Windowed Function.
SELECT
ptt.*
FROM
(
SELECT
cst.CustomerID,
cst.City,
ord.OrderID,
ord.OrderDate,
ROW_NUMBER() OVER(PARTITION BY cst.CustomerID ORDER BY ord.OrderDate DESC) AS [RowNumber]
FROM Customer AS cst
INNER JOIN [Order] AS ord
ON cst.CustomerID = ord.CustomerID
) ptt
WHERE
ptt.[RowNumber] <= 3