16

An application does the following:

  • writes a row to a table that has a unique ID
  • read the table and find the unique ID and output the other variables (among which the timestamp).

The question is: the application needs to read only the non-expired rows, which do expire every 2 minutes. There are a few alternatives to accomplish this: which has the best performance?

Consider that reading the expired rows doesn't matter as it will be done sporadically. The number of non expired rows will always be below a few thousands, but may expect just a few hundreds.

The cron job (run every minute) (or mysql event schedule) to do this can be one of the following (or any other idea?) based on the timestamp:

A) add a BOL variable to index the table and then read WHERE is not expired (based on the boll variable of course) B) add a BOL variable to partition the table and then read only the relevant partition (i am new to partitioning so I'm not sure how this may work out) C) read the whole table and delete each row that is expired and then write the same row to another table D) when writing, write two rows contemporarily in two tables and then delete the expired ones on in one table

OR

E) not use a cron job at all and check the timestamp on every read. (but why would I scan the whole table? Expired row are basically useless to the application itself)

Edit 1

Let me rephrase the question:

The objective is to retrieve all columns of a row only if the row was written less than 2 minutes ago.

The table has this structure, but can be entirely redefined

transactionID CHAR(8) NOT NULL, 
PRIMARY KEY(transactionID),
details VARCHAR(416),
tel INT(10),
time TIMESTAMP(),
address VARCHAR(60),
town VARCHAR(30),
flat VARCHAR (5),
supplier SMALLINT()

also supplier is a foreign key

Indexes are transactionID and eventually "status", an extra column, data type TO_BE_DEFINED_but_probably_SMALLINT_?()

Solution 1: use an indexed column which indicates the status of the row (expired, active, used)

Which is achieved running a cron job to change the field value from 1 (active) to 2 (expired) and the query would be the following:

$transaction = //a POST or GET 

$query = mysqli_query($con,"SELECT * FROM table WHERE transaction = '$transaction' AND status = 1");

if(mysql_num_rows($query)== 0){
   echo "Transaction expired";
}
else{
// I will show all the fields in the selected row;
}

mysqli_close($con);

Solution 2: use a partition based on the timestamp of each row

Which is achieved running a cron job to partition the column every 2 minutes, but then the query is probably faster:

$transaction = //a POST or GET 

$query = mysqli_query($con,"SELECT * FROM table WHERE transaction = '$transaction' AND PARTITION (active)");

if(mysql_num_rows($query)== 0){
   echo "Transaction expired";
}
else{
// I will show all the fields in the selected row;
}

mysqli_close($con);

THe cron job would then be similar to this

$date = date("Y-m-d H:i:s");
$time = strtotime($date);
$check = $time - (2);


$query = mysqli_query($con,"PARTITION BY RANGE (timestamp_field) 
(PARTITION active VALUES LESS THAN ($check)),
((PARTITION expired VALUES MORE THAN ($check));")

Solution 3: forget all of this and copy the rows with a timestamp older than 2 minutes to another table with a cron job

Simple to accomplish although the script would be probably heavy on the write side, altough write efficiency to the "expired" table is irrelevant, I want the "active" query to be fast.

Solution 3B: When adding a row, write it also on another table so that it's only a DELETE function that the cron job had to perform.

Solution 4: forget all about it and WHERE on the timestamp

$transaction = //a POST or GET 

$date = date("Y-m-d H:i:s");
$time = strtotime($date);
$check = $time - (2);

$query = mysqli_query($con,"SELECT * FROM table WHERE transaction = '$transaction' AND timestamp > $check");

if(mysql_num_rows($query)== 0){
   echo "Transaction expired";
}
else{
// I will show all the fields in the selected row;
}

mysqli_close($con);
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
smartcity
  • 191
  • 1
  • 2
  • 6
  • Your problem statement is not very clear. When you say `BOL`, do you mean boolean or something else? Is your objective to speed up reading of the non-expired rows, or deleting of the expired rows? You talk `WHERE` searching and then in the same paragraph about deleting rows and then again about duplicating rows. Please also mention what is the _current performance_ and the _current table size_ (if available). Finally, if expired rows are useless; why do you even want to go through this exercise since they will only be a few thousand (which is irrelevant). – Burhan Khalid Dec 23 '13 at 13:26
  • sounds to me you are doing an simple thing very very complex.. you expire every 2 minutes records.. so you also are inserting/updating every 2 minutes records that are not expired right?? if not you should elaborate this more!! Because it sounds like you want to implement an custom `MultiVersion Concurrency Control` (MCC or MVCC) – Raymond Nijland Dec 23 '13 at 17:49
  • @BurhanKhalid Objective is to speeding the reading of non-expired rows. By BOL I mean a variable to check whether is expired/non expired, but could also be a tinyINT if the type of status of the row can also be a third one. THe reason why I wrote BOl is that I have little understanding on the performance of indexing based on the variable type, and I guessed with a boolean would be faster. There is no current table size as the app is being developed. As per your last line: why should I make the query run through the whole table, expired rows will be much more than a few thousands. – smartcity Dec 24 '13 at 15:43
  • @RaymondNijland I'm only updating records to change their status. They are transactional rows which when expired they still serve statistics/billing purposes. My understanding of MCC is that it's used to create different versions of the same row, but in my case the row changes (permanently) to a status where it's not used by the application anymore (besides back-office sporadic reading access) – smartcity Dec 24 '13 at 15:47
  • @user3129652 think we still need the output off show create table [table_name]... And if i understand you correctly you are "deleting" (read expire) record(s) when you update status column to zero (0) or something like that.. And you most likely have an index on status column for selecting the non-expired rows without using an full table scan?? if so updates on the status column are pretty "expensive" because the data and index needs to be updated.. – Raymond Nijland Dec 24 '13 at 16:50
  • @RaymondNijland I've updated my question so it's much clearer I hope. – smartcity Dec 26 '13 at 11:00
  • @user3129652 yes +1 for the details but still it would be nice if you can share column names and datatypes. going to think about this one what will be the cheapest method for updating, inserting and selecting. most likely with will be Solution 4 in combincation with an index on timestamp this will be most likely be the best (cheapest) method for MySQL to handle.... dont know if you also want to cleanup (DELETE) removing very old records, because this will defrag the table and this makes range scans slower on timestamp... – Raymond Nijland Dec 26 '13 at 12:24
  • @RaymondNijland i've included the table structure. It does seem that a daily cron job at night to MOVE the expired rows together with an index on timestamp is the answer? – smartcity Dec 26 '13 at 17:47

1 Answers1

18

You didn't mention which version of MySQL you are using. As long as you have 5.1, you have event scheduler.

CREATE EVENT clearExpired
ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL 2 MINUTES
DO
   UPDATE table SET status=0
    WHERE status <> 0
      AND TIMESTAMPDIFF(SECONDS, NOW(),table.timestamp)>120;

or

CREATE EVENT clearExpired
ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL 2 MINUTES
DO
   DELETE FROM table
    WHERE TIMESTAMPDIFF(SECONDS, NOW(),table.timestamp)>120;

In this case, you also set a trigger that will copy all the deleted rows into a history table before deleting them.

For measuring efficiency, create a table events (columns: timestamp and action), and generate huge number of rows in your table, then change your event to:

...
DO
   BEGIN
      INSERT INTO events SET action="clearExpired start";
      <insert or delete>
      INSERT INTO events SET action="clearExpired end";
   END

Then you can see how long it takes one way vs. the other.

However, this event will run only once every minute or two, while there will be many more inserts. You want to optimize the actions that happen most frequently.

Marjeta
  • 1,111
  • 10
  • 26
  • Let me know if you need help creating a trigger. – Marjeta Dec 27 '13 at 22:45
  • Great answer, I will test it and report back. Does this mean that I don't even need the timestamp column in the table? I knew about the scheduler, only was wondering whether it heavier that a cron job, but considering that mysql has current_timestamp and timestampdiff function yes, it seems the way to go. – smartcity Dec 31 '13 at 12:03
  • Yes, you need the timestamp column. My code is referring to it. – Marjeta Dec 31 '13 at 20:26
  • Also what do you think in term of performance and resource usage? Solution 1 will use less resources as UPDATE is less heavy but read performance will degrade with a large number of columns, and in that case we still have to decide whether to index or partition the table either on the status column or the timestamp column. The second solution is obviously super fast in reading the table but may use much more resources as deleting and inserting in the new table are heavier, and having a sort of deamon doing BEGIN inserts and delete END all the time could represent a problem? – smartcity Jan 01 '14 at 10:10