1

Edit:

Summarised Question:

Database has 1 table with 70 columns and 20 rows.

The last 63 columns are all int(4) and named bus-time01 to bus-time63.

Min value for data in these columns is 0. Max value is 2359.

Any values greater than 0 will be a minumum value of 500.

Column values all increment upwards in each row (e.g. 0501, 0539, 1149, 2138).

Never downwards and never identical numbers in same row (but other rows may contain identical values).

I run a select query that reduces rows down to between 1 and 4.

I wish to add to my select query some code to search with an integer, between 500-2359, for a match. Without specifying each column name if possible (as query would become huge)

I wish to find closest match to that integer in each row.

If no exact match, then the next highest number.

Once a match is found I would like the value of the column to the left and right of the one found.

I do not require the column names returned (if this is easier), just the returned values.

Can I do this with a query?

Detailed question:

I have a simple Javascript/PHP/MySQL/Ajax web application that displays bus times.

I want to improve on it so it shows just the next bus due (as well as the earlier bus and the next later bus).

Bus times are stored in database as:

columns:
bus-time01
bus-time02
bus-time03

And so on up to bus-time63.

int(4) Time values are stored as: 202, 930, 1230, 1945 etc

The database has just the one table, with just 20 rows (so not large at all).

70 columns in total. The other columns are: id, bus_number, stop_id, start_stop, end_stop, special, day.

My query as it is returns anything from 1 to 4 rows which are stored in a table. The variables for the query come from select boxes and divs on the page. I use ajax and a separate php page to run the query:

$result=mysqli_query($conn, "SELECT * FROM `timetable` WHERE `start_stop` = '" . $ss . "' AND `end_stop` = '" . $es . "' AND `day` = '" . $tmpDay . "'");

What I want is for each row displayed in the table to only show times close to the one in the page div (which displays the eact time). Explained more below.

I have a javascript function that gets the current time. It is displayed in web pages, as with the database values, as. e.g. 0202, 0930, 1230, 1945 etc.

The only difference is it shows the leading zero (which may casue an issue?).

This is the time function:

(function() {

    function checkTime(i) {
        return (i < 10) ? "0" + i : i;
    }

    function getTimeNow() {

        var today = new Date(),
            hournow = checkTime(today.getHours()),
            minutesnow = checkTime(today.getMinutes()),
            timenow = hournow + "" + minutesnow;
        document.getElementById('time-now').innerHTML = timenow;
        t = setTimeout(function() {
            getTimeNow()
        }, 500);
    }

    getTimeNow();

})();

What I want to achieve is to get the current time from the time-now field, query the database, and display only the records that match the criteria below.

e.g. If time is 0913 then get the records that are closest to that, for each valid row. Where one is lower and 2 are higher (or possibly 1 of these is equal and 1 is higher). i.e. 910, 915 and 920 (or 910, 913 and 920).

The table will look like the below:

If the time is currently 09:13

Bus# | Stop# | Earlier Bus | Next Bus | Later Bus |
---------------------------------------------------
 1   |  Z    |   0910      |  0915    |   0920    |
 5   |  B    |   0905      |  0920    |   0935    |
 3   |  A    |   0900      |  0930    |   1000    |

This is all new to me and I really don't know where to begin here. I guess I need to get the whole row of times and somehow to search them for the values I require? Or is there a way to ask SQL to directly fetch the records I need.

Thanks for any pointers in the right direction.

  • Well I have spent all day googling for an answer to this and it seems it is most likley impossible. No replies here but it is a Sunday, so maybe thats why? If anyone gets the chance to look and let me know if it is impossible please. I will then take the data into Excel where I can do this with much simplicity. –  Apr 05 '20 at 18:13
  • I believe you can do it all in SQL, if you have stored values as time/date you can do all sort of offsets, but this question is to wide, I'm not even sure did I even understand it all, you need to summarize it and present a more direct problem. – ikiK Apr 05 '20 at 18:23
  • OK thanks. I will see if I can repharse it. But nothing is stored as date time. All are int(4). –  Apr 05 '20 at 18:29
  • You are suffering from an inappropriate data model. The times should be in rows, not columns. See Schwern's answer on that. – Thorsten Kettner Apr 05 '20 at 21:48
  • Thanks Thorsten. I am looking at Schwern's answer now. Coming from an Excel background I think I tried to used that in a database, which obviously does now work. I have a lot to learn but I will stick with this. –  Apr 05 '20 at 22:38

1 Answers1

3

I wish to add to my select query some code to search with an integer, between 500-2359, for a match. Without specifying each column name if possible (as query would become huge).

The last 63 columns are all int(4) and named bus-time01 to bus-time63.

Those columns have to go.

You're storing a list of values as a bunch of rows, and that makes them very hard to search through as equals. Lists in SQL are implemented as join tables.

create table bus_lines (
  id serial primary key,
  name varchar(255) not null unique
);

create table bus_stops (
  id serial primary key,
  bus_line_id integer not null references bus_lines(id),
  name varchar(255) not null unique
);

create table bus_times (
  bus_stop_id integer references bus_stops(id),
  bus_time time not null
);

I'm using time, not an integer, because it brings in a bunch of functions which might be useful.

Each bus line has many bus_stops. Each bus_stops has many bus_times. To find all the times for a single stop, by name, in order.

select bus_time
from bus_times
join bus_stops on bus_stops.id = bus_times.bus_stop_id
where bus_stops.name = ?
order by bus_times.bus_time asc

If you want to find which stops have a particular time...

select bus_stops.name
from bus_times
join bus_stops on bus_stops.id = bus_times.bus_stop_id
where bus_times.bus_time = ?

Without all those duplicate columns in the way, this and many other problems, are much simpler.

If you're new to SQL, joins and structuring data as relationships are most mind-bending part.


I wish to find closest match to that integer in each row.

If no exact match, then the next highest number.

Once a match is found I would like the value of the column to the left and right of the one found.

There's two parts to this. First, finding the closest match. Second, getting the previous and next.

The second part is easy with window functions. Window functions are like a sliding window around the result row to see what is before and after, referred to as "lag" and "lead".

We set up our window frame. We must define the order for "previous" and "next" with order by bus_time asc as well as ensuring we only select from times at the same stop with partition by bus_stop_id.

select
  bus_stops.name,
  lag(bus_time) over w,
  bus_time,
  lead(bus_time) over w
from bus_times
join bus_stops on bus_stops.id = bus_times.bus_stop_id
window w as (
  partition by bus_stop_id
  order by bus_time asc
);

+------+----------------------+----------+-----------------------+
| name | lag(bus_time) over w | bus_time | lead(bus_time) over w |
+------+----------------------+----------+-----------------------+
| A    | NULL                 | 05:01:00 | 05:39:00              |
| A    | 05:01:00             | 05:39:00 | 11:49:00              |
| A    | 05:39:00             | 11:49:00 | 21:38:00              |
| A    | 11:49:00             | 21:38:00 | NULL                  |
| B    | NULL                 | 05:31:00 | NULL                  |
+------+----------------------+----------+-----------------------+

Window functions require MySQL 8.0. You can do it without window functions, but it's harder. MySQL 8 brings a lot of good things and it's worth the upgrade.


Now we need to find the first row which matches a time, or is after it. That's simple.

select bus_stops.name, bus_time
from bus_times
join bus_stops on bus_stops.id = bus_times.bus_stop_id
where bus_time >= '05:30'
order by bus_time
limit 1;

+------+----------+
| name | bus_time |
+------+----------+
| A    | 05:39:00 |
+------+----------+

But we also need the stops before and after. That's where we run into a problem.

select
  bus_stops.name,
  lag(bus_time) over w,
  bus_time,
  lead(bus_time) over w
from bus_times
join bus_stops on bus_stops.id = bus_times.bus_stop_id
where bus_time >= '05:30'
window w as (
  partition by bus_stop_id
  order by bus_time asc
)
limit 1

+------+----------------------+----------+-----------------------+
| name | lag(bus_time) over w | bus_time | lead(bus_time) over w |
+------+----------------------+----------+-----------------------+
| A    | NULL                 | 05:39:00 | 11:49:00              |
+------+----------------------+----------+-----------------------+

window works over the results, and where (bus_time - '0530') >= 0 excludes any times before our desired time.

No problem, we'll just search for the proper lag and lead!

select
  bus_stops.name,
  lag(bus_time) over w,
  bus_time,
  lead(bus_time) over w
from bus_times
join bus_stops on bus_stops.id = bus_times.bus_stop_id
where lag(bus_time) < '05:30'
  and bus_time >= '05:30'
  and lead(bus_time) > '05:30'
window w as (
  partition by bus_stop_id
  order by bus_time asc
)
limit 1

Nope. You can't use window functions in a where clause in MySQL.

Instead, we first calculate the lag and lead, then we query that. You can do this using a derived table, but I prefer common table expressions as they have fewer limitations and its easier to see how the query is built up.

-- Create a little virtual able with the bus time
-- and their previous and next times.
with around_bus_times as (
  select
    bus_stop_id,
    lag(bus_time) over w as 'prev',
    bus_time,
    lead(bus_time) over w as 'next'
  from bus_times
  window w as (
    partition by bus_stop_id
    order by bus_time asc
  )
)
-- Read from that table
select
  bus_stops.name,
  prev,
  bus_time,
  next
from around_bus_times
join bus_stops on bus_stops.id = around_bus_times.bus_stop_id
where prev < '05:30'
  and bus_time >= '05:30'
  and next > '05:30'

+------+----------+----------+----------+
| name | prev     | bus_time | next     |
+------+----------+----------+----------+
| A    | 05:01:00 | 05:39:00 | 11:49:00 |
+------+----------+----------+----------+

The final touch is what happens if the matching time is the fist or last bus time and has no previous or next. prev or next might be null and comparing null is always false. So we need to take this possibility into account.

with around_bus_times as (
  select
    bus_stop_id,
    lag(bus_time) over w as 'prev',
    bus_time,
    lead(bus_time) over w as 'next'
  from bus_times
  window w as (
    partition by bus_stop_id
    order by bus_time asc
  )
)
select
  bus_stops.name,
  prev,
  bus_time,
  next
from around_bus_times
join bus_stops on bus_stops.id = around_bus_times.bus_stop_id
where (prev is null or prev < '05:01')
  and bus_time >= '05:01'
  and (next is null or next > '05:01')

+------+------+----------+----------+
| name | prev | bus_time | next     |
+------+------+----------+----------+
| A    | NULL | 05:01:00 | 05:39:00 |
| B    | NULL | 05:31:00 | NULL     |
+------+------+----------+----------+

If you have night buses which run after midnight this will fail to offer them as a next. That's yet another problem.


This is a lot, but I hope you can follow how it's built up in stages.

Community
  • 1
  • 1
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • 1
    Thank you Schwern for taking your time to look at this. I am very grateful. I am taking some time now to read your answer now and will surely implement your model. Just going to take me some time to absorb all that information and implement it. Thank you very much. –  Apr 05 '20 at 22:41
  • I just looked in phpMyAdmin to check the server version. It is actually MariaDB?. Is this OK to implement your model? Server version: 10.2.31-MariaDB-log-cll-lve - MariaDB Server –  Apr 05 '20 at 22:56
  • Another small problem and again, my fault. There is currently only one bus line but I will eventually add another (assuming bus line is bus company? or is it bus number?). Each bus has a number, such as 1, 1A, 7, 7A, 19, 25 etc. So I put these in for bus line as, 'BusCompanyName 1', 'BusCompanyName 1A', 'BusCompanyName 7' etc. The problem is some of the bus stops are serviced by the same bus line/bus number (one stop by 6 of them). If I try to add the same stop with each bus_line_id I am getting an error of duplicate entry for key name for the stop name. Sorry, I hope I explained this well. –  Apr 05 '20 at 23:42
  • I have a workaround for the above. Name the stop and add the bus number to each separate one. –  Apr 05 '20 at 23:57
  • I will also have to add in a 'day' column into the bus_times table as they have a different service on Sundays. But it is looking great so far. –  Apr 06 '20 at 00:20
  • Thank you very much. I still have to get the Sunday timetable working (not attempted it yet) but I have imported all the weekday data and it is working pefectly using your example queries above. I am sure I will now be able to make this work as I need in my application. Thanks again. –  Apr 06 '20 at 03:09
  • @flakie Yes, [MariaDB 10.2 has Window functions](https://mariadb.com/kb/en/window-functions/) and [CTE](https://mariadb.com/kb/en/with/) / I made lines to buses one-to-many; one line has many stops, but each stop has only one line. You need a ["many-to-many" relationship](https://stackoverflow.com/questions/7296846/how-to-implement-one-to-one-one-to-many-and-many-to-many-relationships-while-de) which needs an extra join table of line IDs and stop IDs. / A day-of-week column will lead to redundant entries. Consider instead the type of day: weekday, weekend, holiday. – Schwern Apr 06 '20 at 08:00
  • Thanks. Do you think an extra table, identical to the bus_times table, but named bus_times_sunday would work? I would then run a separate query against this table just for Sundays? –  Apr 06 '20 at 09:45
  • I gave it a go. Imported sunday data and ran the query. All seems to work really well. Thanks again for this. It all works so much better than I could have hoped for. –  Apr 06 '20 at 10:04
  • @flakie It will work, but you'll find yourself in the same situation as you were with the bus time columns; having to repeat queries across multiple tables. Duplicating columns or data in SQL is a red flag. Instead, add an [`enum`](https://dev.mysql.com/doc/refman/8.0/en/enum.html) `day_type` column to `bus_times` and add a `where day_type = ?` to your queries. – Schwern Apr 06 '20 at 18:06