1

i'm having a good time coding a little visitor counter. it's a PHP5/SQLite3 mix.

made two database tables, one for the visitors, and one for the hits. structure and sample data:

CREATE TABLE 'visitors' (
'id' INTEGER DEFAULT NULL PRIMARY KEY AUTOINCREMENT,
'ip' TEXT DEFAULT NULL,
'hash' TEXT DEFAULT NULL,
UNIQUE(ip)
);

INSERT INTO "visitors" ("id","ip","hash") VALUES ('1','1.2.3.4','f9702c362aa9f1b05002804e3a65280b');
INSERT INTO "visitors" ("id","ip","hash") VALUES ('2','1.2.3.5','43dc8b0a4773e45deab131957684867b');
INSERT INTO "visitors" ("id","ip","hash") VALUES ('3','1.2.3.6','9ae1c21fc74b2a3c1007edf679c3f144');

CREATE TABLE 'hits' (
'id' INTEGER DEFAULT NULL PRIMARY KEY AUTOINCREMENT,
'time' INTEGER DEFAULT NULL,
'visitor_id' INTEGER DEFAULT NULL,
'host' TEXT DEFAULT NULL,
'location' TEXT DEFAULT NULL
);

INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('1','1418219548','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('2','1418219550','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('3','1418219553','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('4','1418219555','2','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('5','1418219557','1','localhost','/some/path/example.php');
INSERT INTO "hits" ("id","time","visitor_id","host","location") VALUES ('6','1418219558','3','localhost','/some/path/example.php');

i now want to fetch the visitors data, but only from those who where active in the last 30 seconds for example. i need the following data as output, here with user id 1 as example:

$visitor = Array(
    [id] => 1
    [ip] => 1.2.3.4
    [hash] => f9702c362aa9f1b05002804e3a65280b
    [first_hit] => 1418219548
    [last_hit] => 1418219557
    [last_host] => localhost
    [last_location] => /some/path/example.php
    [total_hits] => 4
    [idle_since] => 11
)

i'll get this with my current query, all good, but as you can see i need a lot of sub-selects for this:

SELECT 
   visitors.id,
   visitors.ip,
   visitors.hash,
   (SELECT hits.time FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id ASC LIMIT 1) AS first_hit,
   (SELECT hits.time FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS last_hit,
   (SELECT hits.host FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS last_host,
   (SELECT hits.location FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS last_location,
   (SELECT COUNT(hits.id) FROM hits WHERE hits.visitor_id = visitors.id) AS total_hits,
   (SELECT strftime('%s','now') - hits.time FROM hits WHERE hits.visitor_id = visitors.id ORDER BY hits.id DESC LIMIT 1) AS idle_since
FROM visitors
WHERE idle_since < 30
ORDER BY last_hit DESC

so, is this ok for my use case or do you know a better approach to get this data out of those two tables? i already played around with JOINS, but no matter how i tweaked it, COUNT() gave me wrong outputs, like user id 1 has only one total hit for example.

i probably have to re-model the database, if i wanna use JOINS properly, i guess.


Update: based on AeroX' Answer i've built the new query. it basically had just one little bug. you can't have MAX() in a WHERE clause. using HAVING now after the GROUPING. i also tested both the old and the new one with EXPLAIN and EXPLAIN QUERY PLAN. looks much better. Thank you guys!

SELECT
    V.id,
    V.ip,
    V.hash,
    MIN(H.time) AS first_hit,
    MAX(H.time) AS last_hit,
    strftime('%s','now') - MAX(H.time) AS idle_since,
    COUNT(H.id) AS total_hits,
    LH.host AS last_host,
    LH.location AS last_location
FROM visitors AS V

INNER JOIN hits AS H ON (V.id = H.visitor_id)

INNER JOIN (
    SELECT visitor_id, MAX(id) AS id
    FROM hits
    GROUP BY visitor_id
) AS L ON (V.id = L.visitor_id)

INNER JOIN hits AS LH ON (L.id = LH.id)

GROUP BY V.id, V.ip, V.hash, LH.host, LH.location

HAVING idle_since < 30

ORDER BY last_hit DESC
Ninja Cat
  • 25
  • 6

2 Answers2

1

One of the best ways to measure query performance is using explain.

From sqlite

The EXPLAIN QUERY PLAN SQL command is used to obtain a high-level description of the strategy or plan that SQLite uses to implement a specific SQL query. Most significantly, EXPLAIN QUERY PLAN reports on the way in which the query uses database indices. This document is a guide to understanding and interpreting the EXPLAIN QUERY PLAN output. Background information is available separately:

Notes on the query optimizer.
How indexing works.
The next generation query planner. 

An EXPLAIN QUERY PLAN command returns zero or more rows of four columns each. The column names are "selectid", "order", "from", "detail". The first three columns contain an integer value. The final column, "detail", contains a text value which carries most of the useful information.

EXPLAIN QUERY PLAN is most useful on a SELECT statement, but may also be appear with other statements that read data from database tables (e.g. UPDATE, DELETE, INSERT INTO ... SELECT).

An example of an explain query is:

EXPLAIN SELECT * FROM COMPANY WHERE Salary >= 20000;

http://www.tutorialspoint.com/sqlite/sqlite_explain.htm

Below are more complex usage examples.

How can I analyse a Sqlite query execution?

Community
  • 1
  • 1
Anthony
  • 107
  • 5
1

You probably want to clean this up but this should give you the idea of how to make the joins and how to use the GROUP BY statement to aggregate your hits table for each visitor. This should be more efficient then using lots of sub-queries.

I've included comments on the joins so that you can see why I'm making them.

SELECT 
   V.id,
   V.ip,
   V.hash,
  MIN(H.time) AS first_hit,
  MAX(H.time) AS last_hit,
  COUNT(H.id) AS total_hits,
  strftime('%s','now') - MAX(H.time) AS idle_since,
  LH.host AS last_host,
  LH.location AS last_location
FROM visitors AS V
-- Join hits table so we can calculate aggregates (MIN/MAX/COUNT)
INNER JOIN hits AS H ON (V.id = H.visitor_id)
-- Join a sub-query as a table which contains the most recent hit.id for each visitor.id
INNER JOIN (
  SELECT visitor_id, MAX(id) AS id
  FROM hits
  GROUP BY visitor_id
) AS L ON (V.id = L.visitor_id)
-- Use the most recent hit.id for each visitor.id to fetch that most recent row (for last_host/last_location)
INNER JOIN hits AS LH ON (L.id = LH.id)
GROUP BY V.id, V.ip, V.hash, LH.host, LH.location
HAVING idle_since < 30
ORDER BY last_hit DESC
AeroX
  • 3,387
  • 2
  • 25
  • 39
  • in the original query `first_hit` is not the time with the smallest value, but the time value from the row having the smallest `id` in `hits` (similar for `last_hit`) – DrCopyPaste Dec 10 '14 at 14:46
  • @DrCopyPaste This should achieve the same result though as the hits are recorded in chronological order. The id column is an auto increment and hits are added as people view pages, so the most recent hit should always have the highest id and the oldest hit should have the lowest id. – AeroX Dec 10 '14 at 14:53
  • 1
    Yea, that is probably true, I just wanted to point out that you replaced something with something syntactically completely different that may actually mean the same thing semantically ;) *(because it was not written explicitly in the question, though your assumption makes sense given what OP states)* – DrCopyPaste Dec 10 '14 at 15:03
  • 1
    "WHERE idle_since < 30" does not work. can't have MAX() in WHERE. replacing it with HAVING after the GROUP BY works. already updated my question. – Ninja Cat Dec 10 '14 at 16:05
  • @NinjaCat Yep, I missed that out. It should have been in the `HAVING` section instead of `WHERE` – AeroX Dec 10 '14 at 16:10