3

I have a table of orders. In this table, amongst other rows, I have an ID (PK), Customer ID, Shipping country, and Order date

ID   | CustomerId | ShippingCountry | OrderDate
1    | 111111     | DE              | 2016-08-13
2    | 222222     | GB              | 2016-08-17
3    | 111111     | ES              | 2016-09-05
4    | 333333     | ES              | 2016-10-25
5    | 444444     | US              | 2016-10-26
6    | 555555     | FR              | 2016-10-29
7    | 666666     | DE              | 2016-11-04
8    | 111111     | DE              | 2016-11-12
9    | 222222     | US              | 2016-12-01
10   | 444444     | GB              | 2016-12-01
11   | 555555     | FR              | 2016-12-05
12   | 333333     | ES              | 2016-12-15

I need to select the rows where the customer's previous order doesn't match their latest order's shipping country. I also want to see the 2 differing shipping codes in the results.

Using the above example, I want to see:

CustomerId | ShippingCountryLatest | ShippingCountryPrevious
111111     | DE                    | ES
222222     | US                    | GB
444444     | GB                    | US

The ID and OrderDate can be used to determine the order of things. ID is an incrementing number, order date is as it says.

The table I need to run this against has about 500k rows.

Any suggestions?

Here's a SQLFiddle to get you started: http://sqlfiddle.com/#!6/5d046/1/0

gfyans
  • 1,236
  • 1
  • 16
  • 21

2 Answers2

4

Use ROW_NUMBER to give the latest record #1 and the previous #2 per customer. Then aggregate per customer and compare the two values.

select 
  CustomerId,
  max(case when rn = 1 then ShippingCountry end) as ShippingCountryLatest, 
  max(case when rn = 2 then ShippingCountry end) as ShippingCountryPrevious
from
(
  select 
    CustomerId,
    ShippingCountry,
    row_number() over (partition by CustomerId order by ID desc) as rn
  from orders
) numbered
group by customerid
having
  max(case when rn = 1 then ShippingCountry end) <>
  max(case when rn = 2 then ShippingCountry end);

Your fiddle back: http://sqlfiddle.com/#!6/5d046/13 :-)

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
1

Use lag():

select o.*
from (select o.*,
             lag(shippingcountry) over (partition by customerid order by orderdate) as prev_shippingcountry
      from orders o
     ) o
where prev_shippingcountry <> shippingcountry ;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Both answers work, but I'm going to mark this as the correct one as according to the docs, lag() is meant for this purpose (I just didn't know about it) – gfyans Apr 13 '17 at 10:27
  • @gfyans: Our two queries are quite different from another. You asked for the latest record and the previous one. Gordon's query however looks at all records and their previous ones. To look only at the latest and its predecessor cannot be solved wth `LAG` alone. – Thorsten Kettner Apr 13 '17 at 10:51
  • @gfyans . . . Did you find that this didn't work? The OP can accept whichever answer s/he wants to. But I'm curious why you unaccepted this answer, which seems much simpler than other approaches. – Gordon Linoff Apr 14 '17 at 15:12
  • Sorry, your query does work, but Thorsten is correct in saying it returns all previous records where there is a difference in shipping country as well. I missed this when I tested it first time round. In this Fiddle 111111 is returned twice: http://sqlfiddle.com/#!6/5d046/14 I only need the latest compared to the previous. – gfyans Apr 14 '17 at 19:39
  • @gfyans . . . That's a reasonable interpretation of the question. I thought you wanted any two orders where the later had a country differing from the previous. – Gordon Linoff Apr 14 '17 at 23:08
  • It's actually bloody useful to know. Whilst I only need the latest and previous just now, I can foresee a need for what you've written, so thanks! – gfyans Apr 15 '17 at 19:30