0

I have customers, jobs, orders and dates as columns. Customers can have multiple orders per job and of course multiple jobs given a certain date range.
What I am trying to do is figure out how many "new" orders there are within a certain date range. Meaning, if a job has 5 orders, it should only return 1 order. Here's a short example of a database table, desired result, and the query I have been trying to work with thus far.

+-------+-------+---------+------------+
| CusID | JobID | OrderNo |  OrderDate |
+-------+-------+---------+------------+
|   1   |   10  |    25   | 2021-12-22 |
|   1   |   10  |    26   | 2022-02-09 |
|   3   |   5   |    28   | 2022-01-10 |
|   3   |   6   |    29   | 2022-01-11 |
+-------+-------+---------+------------+

There are 4 rows of orders with their associated JobID. The query should return 2 rows since order 25 and order 26 are both associated with JobID 10, only 1 can be counted as "new", while OrderNo 25 falls outside of the outer queries date range.
Here is the query I am trying to use, based on the date being >= to the first of the year. I want to be able to use the OrderDate from the outer query, to compare with the OrderDate from the inner query. If there is a JobID with an OrderDate that is less than the current rows OrderDate that order is NOT new and should return the JobID, thus making the inner query false and filtering out the row.

SELECT * FROM orders
WHERE OrderDate >= '2022-01-01' 
AND JobID NOT IN 
(SELECT JobID FROM orders WHERE inner.OrderDate < outer.OrderDate)

Expected result

|   3   |   5   |    28   | 2022-01-10 |
|   3   |   6   |    29   | 2022-01-11 |
Zackattack
  • 316
  • 1
  • 7
  • Is the purpose to find the newest order for a job within the date range? Then I don't understand why order 26 does not match. This can be done with a subquery or cte and row_number. – Schwern Feb 27 '22 at 01:23
  • it is to find the first order for a job that is new within the date range. Not to include orders for jobs that have later orders. – Zackattack Feb 27 '22 at 02:43
  • What is the date range supposed to be? – Schwern Feb 27 '22 at 07:23

2 Answers2

0

Use row_number to rank the orders for each job. Put that in a subquery and only accept the newest oldest? order for each job.

select custid, jobid, orderNo, orderDate
from (
  select
    *,
    row_number() over ( partition by jobid order by orderDate asc ) as oldest
  from orders
) a
where a.oldest = 1
  and orderDate >= '2022-01-01'

Demonstration

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • Thanks, but if it includes order 26 then it doesnt really solve the problem. Also I believe the same outcome could be achieved with a SELECT DISTINCT(jobid) with ordering the table by orderdate descending. – Zackattack Feb 27 '22 at 02:41
  • @Zackattack There's nothing in your description which would exclude 26, it is within your date range. [You can do this with `select distinct on` in Postgres](https://dbfiddle.uk/?rdbms=postgres_14&fiddle=dd8b1254805bfea865bdc4a4a202c2ed) I don't know about MySQL. – Schwern Feb 27 '22 at 07:53
  • @Schwern Apart from the fact that 26 is not the first order for the job, and the first order for the job (25) is before the given date. – user1191247 Feb 27 '22 at 13:55
  • @nnichols You ask for "new orders" which I assume would be the *last* job in the range; the newest one. You could clarify that by adding an example of multiple orders for a job where one *is* picked. Either way, you can adjust this answer to your needs by changing the order. – Schwern Feb 27 '22 at 18:18
0

Your query is heading in the right direction you just need to apply the alias to the outer table so it is uniquely identifiable -

SELECT *
FROM orders `outer`
WHERE OrderDate >= '2022-01-01'
AND JobID NOT IN
    (SELECT JobID FROM orders WHERE OrderDate < `outer`.OrderDate);

Depending on the number of rows included in your time range, and the average number of orders per job, you may want to aggregate first -

SELECT *
FROM (
    SELECT JobID, MIN(OrderDate) AS MinOrderDate
    FROM orders
    WHERE OrderDate >= '2022-01-01'
    GROUP BY JobID
) o
WHERE JobID NOT IN
    (SELECT JobID FROM orders WHERE OrderDate < o.MinOrderDate);

Assuming it is possible for a job to have multiple orders on one date, you might want to consider using OrderNo instead of OrderDate for your comparison -

SELECT *
FROM orders `outer`
WHERE OrderDate >= '2022-01-01'
AND JobID NOT IN
    (SELECT JobID FROM orders WHERE OrderNo < `outer`.OrderNo);
user1191247
  • 10,808
  • 2
  • 22
  • 32