4

I have some data around a website where the website has different shop sections but when the user checks out at the end, we only know what shop section it is by looking for their most recent section hit

For example if I have data that looks like

session, hit_number, page
a,1,homepage
a,2,generic_page
a,3,shoe_store,
a,4,buy_add_basket
a,5,buy_checkout
b,1,sock_store
b,2,shoe_store,
b,3,buy_add_to_basket
b,4,buy_checkout
c,1,homepage
c,2,sock_store
c,3,sock_store
c,4,buy_add_to_basket
c,5,home_page
c,6,shoe_store
a,5,home_page

I want to persist the last store they went to (where exists and only for when they are in a buy section of the webpage (i.e. page name start with "buy")

Output I am expecting is:

session, hit_number, page
a,1,homepage,null
a,2,generic_page,null
a,3,shoe_store,null
a,4,buy_add_basket,shoe_store
a,5,buy_checkout,shoe_store
b,1,sock_store,null
b,2,shoe_store,null
b,3,buy_add_to_basket,shoe_store
b,4,buy_checkout,shoe_store
c,1,homepage,null
c,2,sock_store,null
c,3,sock_store,null
c,4,buy_add_to_basket,sock_store,
c,5,home_page,null
c,6,shoe_store,null
a,5,home_page,null
shecode
  • 1,716
  • 6
  • 32
  • 50
  • Eh... You could do that. I'm all for analytics but should honestly not capture that data in SQL. You're going to run into database impacting expansions in the future. You should use real-time log monitoring apps for that. I'm all about the right tool for the right job. I don't think I can give a link but I'm partial to Splunk for that. You can get a lot more data analytics that way. Like comparing page views to sales.. scheduled reports and realtime dashboards... etc... – Steve Kline Aug 27 '17 at 00:53
  • @SteveKline this is a one off, i dont need to use splunk for it or to monitor anything. i'm just doing a one off analysis – shecode Aug 27 '17 at 20:38
  • what vendor are you using, ie SQL server, MySQL, Oracle, Postgres, etc... ? – ttallierchio Aug 31 '17 at 12:24

3 Answers3

3

I did this in SQL Server but the query to get the results would work in most vendor. The logic behind it is to check when the page column contains 'buy' then to get the min value i.e the one above where the page name contains 'store'

Create and populate table:

DECLARE @table TABLE
(
    session    VARCHAR(1),
    hit_number INT,
    page       VARCHAR(50)
);
INSERT INTO @table VALUES 
('a',1,'homepage'),
('a',2,'generic_page'),
('a',3,'shoe_store'),
('a',4,'buy_add_basket'),
('a',5,'buy_checkout'),
('b',1,'sock_store'),
('b',2,'shoe_store'),
('b',3,'buy_add_to_basket'),
('b',4,'buy_checkout'),
('c',1,'homepage'),
('c',2,'sock_store'),
('c',3,'sock_store'),
('c',4,'buy_add_to_basket'),
('c',5,'home_page'),
('c',6,'shoe_store'),
('a',5,'home_page');

Select * From @table would give the following results:

session hit_number  page
a       1           homepage
a       2           generic_page
a       3           shoe_store
a       4           buy_add_basket
a       5           buy_checkout
b       1           sock_store
b       2           shoe_store
b       3           buy_add_to_basket
b       4           buy_checkout
c       1           homepage
c       2           sock_store
c       3           sock_store
c       4           buy_add_to_basket
c       5           home_page
c       6           shoe_store
a       5           home_page

Query:

SELECT
    session,
    hit_number,
    page,
    CASE
        WHEN page LIKE 'buy%'
        THEN MIN(CASE
                     WHEN page LIKE '%store'
                     THEN page
                     ELSE NULL
                 END) OVER(PARTITION BY session ORDER BY hit_number)
        ELSE NULL
    END AS previous_buy_page
FROM @table;

Returns desired Results:

session hit_number  page                previous_buy_page
a       1           homepage            NULL
a       2           generic_page        NULL
a       3           shoe_store          NULL
a       4           buy_add_basket      shoe_store
a       5           buy_checkout        shoe_store
a       5           home_page           NULL
b       1           sock_store          NULL
b       2           shoe_store          NULL
b       3           buy_add_to_basket   shoe_store
b       4           buy_checkout        shoe_store
c       1           homepage            NULL
c       2           sock_store          NULL
c       3           sock_store          NULL
c       4           buy_add_to_basket   sock_store
c       5           home_page           NULL
c       6           shoe_store          NULL
dbajtr
  • 2,024
  • 2
  • 14
  • 22
2

If your schema looks like:

create table weblog
(session varchar(10)
,hit_number int
,page varchar(30)
);

INSERT INTO weblog VALUES 
('a',1,'homepage')
,('a',2,'generic_page')
,('a',3,'shoe_store')
,('a',4,'buy_add_basket')
,('a',5,'buy_checkout')
,('b',1,'sock_store')
,('b',2,'shoe_store')
,('b',3,'buy_add_to_basket')
,('b',4,'buy_checkout')
,('c',1,'homepage')
,('c',2,'sock_store')
,('c',3,'sock_store')
,('c',4,'buy_add_to_basket')
,('c',5,'home_page')
,('c',6,'shoe_store')
,('a',5,'home_page');

Then you want a SELECT statement like:

SELECT "session"
, hit_number
, page
, CASE 
  WHEN page like 'buy%' THEN 
  max(CASE 
        WHEN page like '%store' THEN page 
        ELSE NULL
      END) OVER (PARTITION BY session ORDER BY hit_number)
  ELSE NULL
  END as last_store
FROM weblog;

(This is postgres 9.6, which database are you using?)

Incidentally, I agree with @SteveKline's comment that this doesn't seem like the right approach.

Steven Ensslen
  • 1,164
  • 9
  • 21
1

As I understand it, the "last visited store page" should be persistet until another page is visited or the session ends. I am a friend of doing this kind of operation in the backend. A trigger or stored procedure around the add opeartion should be able to do it. In particular an add trigger might work. But wich option you can use and is ideal heavily depends on the DBMS you are using (not all have equal support for both options and performance can be widely different too).

Personally I would persist the "last storepage visited" in the session and just add it to all inserts. I think 2 triggers on the shown table could do it:

  • One trigger that tries to Updates the Session Side "last visited page" entry everytime something is added to this table.
  • And one trigger or stored procedure for the input that transfers the "last visited page" from the Session to this table.

Instead of triggers you can of course also use a stored procedure that wraps this all, but IMHO triggers are cleaner in this case as it does not rely on how the data was added. Normal Session cleanup code would also deal with cleaning up that temporary data. And the default value for session could be "null". You should consider the rare occurence that there is no session (for whatever reason) in both triggers codes. Of course this is only for adding it in the future. It will not retroactively apply to the existing tables.

If you want it to apply to all existing and future data, a computed column on a view would be the best idea I have. At least the DBMS can do a decent amount of caching on those. But again it depends on the DBMS and if it supports something like views with computed columns.

Christopher
  • 9,634
  • 2
  • 17
  • 31