Suppose I have the following table:
create table Section (
id integer not null primary key,
book_id integer not null foreign key references Book (id),
title varchar(100) not null,
page_start varchar(10) not null,
page_end varchar(10) not null
... remaining fields ...
)
Note that the starting page and ending page fields are varchars. This is so I can include page numbering from prefaces that are generally in Roman numerals.
My question is: what is an effective way of modifying this table and writing an app so that:
- I can correctly sort the sections by starting and ending page, ideally using SQL
- I can calculate the length of a section in number of pages
- I can determine whether a given page number (e.g. "xviii" or 475) is within a given section
Keeping in mind the following conditions/facts:
- I do not want users to have to enter any additional information. For example, they should not have to calculate the Arabic equivalent of the prefix and enter it in.
- Whatever rules might exist for the pre-numbering of books are followed exactly (so, for example, all page numbers will be entered in the correct Roman or Arabic formats)
- I can add any additional fields I need to, or even a separate table if necessary
- This is a web app, so I can do pre- or -post- processing on the data in the database before inserting or displaying the data
- Sections may be added or removed on-the-fly, so for example there might be one introduction section, and then another added later on. Pagination and sorting should remain correct for all sections in that book.
I may end up implementing this in a couple of different languages on different platforms, so code-agnostic pseudocode would be preferred.
Clarification
Because I'm dealing with many thousands of records, I can't just loop through all of them programmatically to do things like sorting. So some of the work needs to happen on the database end.
Using njk's idea of a lookup table, we have something like:
SELECT id, book_id, title, page_start, page_end,
COALESCE(RN_Lookup_End.value - RN_Lookup_Start.value + 1, CAST(page_end AS integer)-CAST(page_start AS integer) + 1) as number_of_pages
FROM
Section
LEFT JOIN RN_Lookup AS RN_Lookup_Start ON Section.page_start=RN_Lookup_Start.key
LEFT JOIN RN_Lookup AS RN_Lookup_End ON Section.page_end=RN_Lookup_End.key
ORDER BY
book_id,
CASE WHEN RN_Lookup_Start.value IS NOT NULL
THEN -1
ELSE 0
END, -- roman page numbers come before normal page numbers
COALESCE(RN_Lookup_Start.value, page_start), COALESCE(RN_Lookup_End.value, page_end)
If I wanted to loop through all books ordered by page number. Does that look right?
Thinking on it, I wonder if making the following change to the table would be better:
create table Section (
id integer not null primary key,
book_id integer not null foreign key references Book (id),
title varchar(100) not null,
page_start integer not null,
page_end integer not null,
is_front_matter bit default 0,
page_start_label varchar(10) null,
page_end_label varchar(10) null
... remaining fields ...
)
The query above would look like this:
SELECT id, book_id, title,
COALESCE(page_start_label, CAST(page_start as varchar)) as page_start,
COALESCE(page_end_label, CAST(page_end as varchar)) as page_end,
(page_end - page_start + 1) as number_of_pages
FROM
Section
ORDER BY
book_id, is_front_matter DESC, page_start, page_end
Then all I'd have to do is convert the values for page_start_label
and page_end_label
from Roman to Arabic on insertion and update. The two extra integers plus the bit would mean a little over 8 bytes extra storage per record but with most records leaving page_start_label
and page_end_label
blank I might actually save space!
Does this sound like a reasonable solution? Or am I missing a potential pitfall/downside?