3

Let's say we have a system that executes 'jobs'. These jobs can be wired up into sequences we call 'integrations' which are effectively an ordered set of jobs.

The classic solution would be a join table:

integrations: id, name
jobs: id, name, commands
integrations_jobs: id, integration_id, job_id, integration_order

where integration_order is the position of that integration_job within the integration.

What if instead we just used a postgres array?

integrations: id, name, [job_id1, job_id2, job_id3]
jobs: id, name, commands

Are there any obvious drawbacks I'm missing with using an array instead of a join table? We are using Rails 5 and Postgres on Heroku.

  • Postgres does not currently allow array columns as foreign keys which means that you will lose referential integrity. Also ActiveRecord is not built to use array columns for associations which means you will be fighting the framework. If you flip the argument - what advantage does an array column give you in this case? – max Jul 12 '17 at 18:12
  • Thanks for your help max. We thought it would be simpler to version the integration. Tracking the join object in a different table and recording the history of the order of each of those join objects is more complicated then having a history of arrays. We would be fighting the framework a bit by writing custom rails validations to maintain integrity, but that seemed simpler than versioning the join table. – Nicholas Erdenberger Jul 13 '17 at 03:09
  • You can simply use a `has_many through:` association and use Papertrail to version the "join model". – max Jul 15 '17 at 07:34

2 Answers2

0

You can use Postgres array or JSON or enum field but it will affect performance.
Look here

dendomenko
  • 436
  • 1
  • 3
  • 16
0

The first drawback is you can not query to find a job in integrations, example:

jobs table
id
1
2

integrations table
id, name, job_ids
1, i1, [1,2]
2, i2, [2]

so, if you want to find integrations of a job which has id 2, you need to scan integrations table and check if job_ids has value of 2. If integrations table has a lot of records, it's a big perfomance problem.

If you have a join table, that is just a very simple query and faster. I don't see what wrong if you use a join table.

Thanh
  • 8,219
  • 5
  • 33
  • 56
  • Thanh, I am pretty sure you can index array fields. https://www.postgresql.org/docs/9.5/static/gin-builtin-opclasses.html Also isn't this use case what the ANY / ALL and @> / <@ operators are for? – Nicholas Erdenberger Jul 14 '17 at 15:04