3

Suppose I have the following table:

create table Section (
    id integer not null primary key,
    book_id integer not null foreign key references Book (id),
    title varchar(100) not null,
    page_start varchar(10) not null,
    page_end varchar(10) not null
    ... remaining fields ...
)

Note that the starting page and ending page fields are varchars. This is so I can include page numbering from prefaces that are generally in Roman numerals.

My question is: what is an effective way of modifying this table and writing an app so that:

  • I can correctly sort the sections by starting and ending page, ideally using SQL
  • I can calculate the length of a section in number of pages
  • I can determine whether a given page number (e.g. "xviii" or 475) is within a given section

Keeping in mind the following conditions/facts:

  • I do not want users to have to enter any additional information. For example, they should not have to calculate the Arabic equivalent of the prefix and enter it in.
  • Whatever rules might exist for the pre-numbering of books are followed exactly (so, for example, all page numbers will be entered in the correct Roman or Arabic formats)
  • I can add any additional fields I need to, or even a separate table if necessary
  • This is a web app, so I can do pre- or -post- processing on the data in the database before inserting or displaying the data
  • Sections may be added or removed on-the-fly, so for example there might be one introduction section, and then another added later on. Pagination and sorting should remain correct for all sections in that book.

I may end up implementing this in a couple of different languages on different platforms, so code-agnostic pseudocode would be preferred.

Clarification

Because I'm dealing with many thousands of records, I can't just loop through all of them programmatically to do things like sorting. So some of the work needs to happen on the database end.

Using njk's idea of a lookup table, we have something like:

SELECT id, book_id, title, page_start, page_end, 
    COALESCE(RN_Lookup_End.value - RN_Lookup_Start.value + 1, CAST(page_end AS integer)-CAST(page_start AS integer) + 1) as number_of_pages
FROM
    Section 
    LEFT JOIN RN_Lookup AS RN_Lookup_Start ON Section.page_start=RN_Lookup_Start.key
    LEFT JOIN RN_Lookup AS RN_Lookup_End ON Section.page_end=RN_Lookup_End.key
ORDER BY
    book_id, 
    CASE WHEN RN_Lookup_Start.value IS NOT NULL
    THEN -1
    ELSE 0
    END, -- roman page numbers come before normal page numbers
    COALESCE(RN_Lookup_Start.value, page_start), COALESCE(RN_Lookup_End.value, page_end)

If I wanted to loop through all books ordered by page number. Does that look right?

Thinking on it, I wonder if making the following change to the table would be better:

create table Section (
    id integer not null primary key,
    book_id integer not null foreign key references Book (id),
    title varchar(100) not null,
    page_start integer not null,
    page_end integer not null,
    is_front_matter bit default 0,
    page_start_label varchar(10) null,
    page_end_label varchar(10) null
    ... remaining fields ...
)

The query above would look like this:

SELECT id, book_id, title, 
    COALESCE(page_start_label, CAST(page_start as varchar)) as page_start,
    COALESCE(page_end_label, CAST(page_end as varchar)) as page_end,
    (page_end - page_start + 1) as number_of_pages
FROM
    Section 
ORDER BY
    book_id, is_front_matter DESC, page_start, page_end

Then all I'd have to do is convert the values for page_start_label and page_end_label from Roman to Arabic on insertion and update. The two extra integers plus the bit would mean a little over 8 bytes extra storage per record but with most records leaving page_start_label and page_end_label blank I might actually save space!

Does this sound like a reasonable solution? Or am I missing a potential pitfall/downside?

Jordan Reiter
  • 20,467
  • 11
  • 95
  • 161
  • 1
    *This is a web app, so I can do pre- or -post- processing on the data in the database before inserting or displaying the data*. Then I would recommend you do this in the application layer, not SQL. – D'Arcy Rittich Oct 03 '12 at 17:02
  • Store the page number as an integer and convert it to roman numerals in PHP. Add a field somewhere (section? book?) indicating what format the end-user should see – NullUserException Oct 03 '12 at 17:04
  • 1
    I would create a lookup table that references actual page number to sheet number. – Kermit Oct 03 '12 at 17:06
  • @RedFilter, I'd love for your comment to be an answer that people can vote on. It is an excellent suggestion that probably scales the best. Same thing for NullUserException. – Tim Lehner Oct 03 '12 at 17:31
  • @RedFilter I'd still like to somehow be able to do things like sort by page number within SQL, so it can't all happen on the application layer. – Jordan Reiter Oct 03 '12 at 19:09
  • @JordanReiter Sure, you can sort by *integer* page number. – D'Arcy Rittich Oct 03 '12 at 19:34
  • @RedFilter got it, so I assume you meant not to store the Roman numerals in the database and use the webapp to convert them to Roman. – Jordan Reiter Oct 03 '12 at 19:35
  • I'll add that one thing I like about the latest possibility (two sets of fields in the table) is that if some other numbering scheme is introduced, I can just add a new function to calculate how it's calculated numerically (for example the crazy pagination schemes you'll see in car manuals). – Jordan Reiter Oct 03 '12 at 19:37
  • Not sure if when you mentioned manuals this is what you where talking about but some do page numbers where its the chapter number followed by the page number. – Joe W Oct 03 '12 at 19:44

1 Answers1

1

While I would normally leave presentation details to the presentation layer, I agree with @njk in this case.

Since you'll have roman numerals as part of your incoming data, need to potentially serve several front-ends and translations, and you want to be able to easily sort in SQL, I would pre-calculate a lookup table of roman numerals and their integer equivalents up to some fairly large page number (32,767 perhaps, though you know your data).

Just to reiterate, I'm assuming there is some reasonable upper-limit on the page numbers, and you won't be converting anything where you'll need a lookup table of millions of values. This would be enough to more than likely convince me to use code in the presentation layer(s) for this.

There are seemingly already any number of functions out there so you don't have to reinvent the wheel to create this table.

Even if you move to a different scheme in the future, the idea of incoming data with roman numerals means that you may always need this type of table to be able to import books/sections.

Tim Lehner
  • 14,813
  • 4
  • 59
  • 76