0

I'm not too sure if there is a special term for this kind of data, but the concept is similar to a C++ std::variant (but maybe in PostgreSQL it wouldn't be ideal to use a variant-like structure).

As an example, suppose I have a table purchases that has some columns:

CREATE TABLE purchases (
    purchaseid bigserial,
    buyerid bigint,
    name text,
    ...
);

Each purchase would be paid via one of a small number of possibilities, e.g. cheque or bitcoin. Each possibility has different associated fields that would need to be saved.

Payment by cheque would require saving the cheque number and the bank name, and payment by bitcoin would require saving the sender's bitcoin address and a return bitcoin address.

It would seem wise (with my little knowledge of PostgreSQL) to have cheque payments and bitcoin payments reside in separate tables due to the differing fields.

What is the ideal way to store such data (and relate them to each row in purchases) if I know that each purchase can only be paid by either cheque or bitcoin, but not both?

Bernard
  • 5,209
  • 1
  • 34
  • 64
  • For the most flexible storage, I would use JSONB. https://stackoverflow.com/questions/22654170/explanation-of-jsonb-introduced-by-postgresql – David Aldridge Aug 26 '17 at 08:49
  • @David I don't think I need "flexible storage". More important (I think) is the representation (and/or enforcement) of an either-or construct, but JSONB doesn't quite do this. – Bernard Aug 26 '17 at 11:50

1 Answers1

2

The approach I take for such situations is indeed as you suggest: you need a different table for cheque details, bitcoin details etc. These tables must include the purchase id and the purchase table must have a paymenttypeid. Then when wishing to view the purchases together with details, you need to do a left join. The trick here is to have a case statement in the view such that depending on paymenttypeid you select from the current left joined table. In doing this, you need to make sure that you select the same number of columns from the details tables, making appropriate adjustments for types.

So you get something like:

SELECT p.purchasedate, 
       p.otherfields,
       pt.description,
       case p.paymenttype 
           WHEN 1
               THEN bd.DetailField1 -- cast if necessary
           WHEN 2
               THEN cd.DetailField1 -- cast if necessary
           END AS Detail1,
       case p.paymenttype 
           WHEN 1
               THEN bd.DetailField2 -- cast if necessary
           WHEN 2
               THEN cd.DetailField2 -- cast if necessary
           END AS Detail2
       FROM purchases p
       inner join paymenttypes pt 
           on p.paymenttypeid = pt.id
       left join bitcoindetails bd 
           on bd.purchaseid = p.id
       left join chequedetails cd 
           on cd.purchaseid = p.id

Although this is based on left joins, we know in practice it will always succeed, because the case statement is based on the paymenttypeid. It means that we are always selecting from the correct details table. Details must be present in the details table for the record selected, because it matches the paymenttypeid.

Jonathan Willcock
  • 5,012
  • 3
  • 20
  • 31
  • This would seem like the only way if there's nothing that really enforces either-or logic (i.e. where PostgreSQL prohibits the database from containing the same `purchaseid` in both cheque and bitcoin tables). For the query, I think it's nicer to not use `CASE`, but just select those fields in separate columns (which will be `NULL` for the other type of payment). This would allow me to keep the original types of the columns and allow a different number of columns for cheque as compared to bitcoin. (Feels sorta weird for bank ID and return bitcoin address to share a column, don't you think?) – Bernard Aug 26 '17 at 12:03
  • @Bernard depends on how you are displaying the data. I have done this quite often. As long as the headings make clear that the data is variable. Your alternative has the disadvantage that you need many more columns. In the end "you pays your money and takes your choice" – Jonathan Willcock Aug 26 '17 at 12:10
  • I suppose making the headings clear would be sufficient, and your method is better if some human reads the database output directly. But for my use, the data is requested by a web server that will output nicely designed HTML to the user, so many more columns isn't really a problem (unless there's some column limit I'm not aware of). The size of the transmitted data shouldn't be much larger (just a few more `NULL`s), and its just as easy for the web server to read from a different column. Thanks for your suggestion to use different tables and paymenttypeid - I think I'll go with this design. – Bernard Aug 26 '17 at 12:31
  • Sounds like a plan. Please feel free to mark question as answered :-) have a great weekend. – Jonathan Willcock Aug 26 '17 at 12:36
  • Actually, it seems I can do away with paymenttypeid totally. I can just check if `bd.DetailField1` (or whatever the primary key of `bd` is) is null. – Bernard Aug 26 '17 at 12:36
  • Can I wait a while before marking the question as answered? If there is a solution that enforces either-or logic, I would prefer it. It's Saturday night now and I'll mark it as answered tomorrow if there's no better way :) – Bernard Aug 26 '17 at 12:38
  • Yes you are right. paymenttypedid is there so that CASE works. I'd still be tempted to include it - you never know when it might be useful. In particular, it allows you to have a paymenttypes table with text descriptions. Always useful for display purposes (dropdown boxes etc.) – Jonathan Willcock Aug 26 '17 at 12:38
  • Of course you should wait. I wasn't trying to hassle you! – Jonathan Willcock Aug 26 '17 at 12:40
  • Yes, I didn't think of display purposes - seems like a possible use case for paymenttypeid. I think the downside of having that column is that there would be hardcoded constants so everyone maintaining the database would have to ensure that the database doesn't go into an invalid state. Perhaps I'll leave out that column for now and add the column (which should be an easy query based on the null-ness of the primary keys) when it is needed. – Bernard Aug 26 '17 at 12:45