9

I am currently working on a simple revision system that enables me to store multiple versions of a single file, which works fine so far.

Table structure is as follows (obsolete columns removed for the sake of brevity):

file_id     file_revision     file_parent      file_name
--------------------------------------------------------
1           1                 0                foo.jpg
2           2                 1                foorevised.jpg                 
3           3                 1                anotherrevision.jpg

Where:

  • file_id is the primary key, which auto increments
  • file_revision stores the revision number, defaulting to 1 when it's the first
  • file_parent is the top level parent of revision, defaulting to 0 when first.
  • file_name being the file name.

The problem:

  • Preferably using a single query I want to retrieve all files...
  • But only the latest revision of each file...
  • ... when only one revision is stored (original), this one should be retrieved.

Any pointers are greatly appreciated. Thanks in advance.

Aron Rotteveel
  • 81,193
  • 17
  • 104
  • 128
  • I don't see why this was downvoted, I think it's a legit question, I'd be curious to see the answer. – Rob Mar 12 '09 at 13:36

2 Answers2

5

The most efficient way for the sake of retrieval is to add a column like is_latest which you need to populate in advance, then select * from table where file_id=1 and is_latest=true when you want to grab the latest version of file 1. Obviously this will make updating this table more complicated, however.

Another way to do it would be to store the latest versions of the files in one table, and historical versions in another table. If you predominantly want to select all files that are the latest version, select * from table where is_latest=true could likely amount to a full table scan even if if is_latest is indexed. If the latest rows were all in one table the database can read them all out in sequential IO and not have to either 1) do a lot of seeks through the table to find just the records it needs or 2) scan the whole table discarding large amounts of data along the way for the old records.

Assuming you don't want to change the existing table design, what you want to do is called selecting the groupwise maximum, see this article for several different ways to do it in mysql.

ʞɔıu
  • 47,148
  • 35
  • 106
  • 149
  • I realise this is a very old answer, but surely a timestamp check would be simpler, easier and more efficient. You dont need to update the last revision to remove the is_latest flag, and when selecting would just select 1 and order by timestamp. – Sk446 May 08 '13 at 15:16
1
file_id     file_revised     file_name              Time_Stamp
-----------------------------------------------------------------
1           1                 foo.jpg                 insert_time
2           1                 foorevised.jpg          insert_time                 
3           1                 anotherrevision.jpg     insert_time

I would then do variations on queries like this:

SELECT * WHERE file_revision = 1 ORDER BY Time_Stamp GROUP BY file_revision

Or any any number of variation on this type of query, ie limit 1 or Order by file_id as the highest will also be the latest, etc..

Andre
  • 2,449
  • 25
  • 24