4

I have a JRuby application that I am trying to create has_many through: relationships for databases that live on two completely different servers. I understand that joins will not work across tables on different servers. What I would like is to simulate the joins so that a developer using the model doesn't have to be (as) aware of the cross-server join.

There are some additional complexities in this setup:

  • The remote database is read-only
  • The table names and primary keys in the remote database do not follow rails naming conventions. (The remote database is a Data Warehouse)
  • I would like to be able to use the model as though a has_and_belongs_to_many was in it.

I've considered writing my own custom association, but that's a bit complex and I can't find any guides or really any starting points other then reading through the Rails code.

Is there an easy way to do this that I'm missing?

Is building a custom ActiveRecord association the best way to do this? and if so, where do I start?

Code similar to my setup:

config/database.yml

development:
  adapter: postgresql
  encoding: unicode
  database: main
  username: username
  password: password
  host: localhost
  pool: 5

remote_development: # Read only
  adapter: jdbcmssql
  driver: com.microsoft.sqlserver.jdbc.SQLServerDriver
  url: 'jdbc:sqlserver://foo.com;databaseName=main'
  username: username
  password: password

app/models/account.rb

class Portfolio < ActiveRecord::Base
  #has_and_belongs_to_many :dim_users, join_table: :accounts_dim_user
end

app/models/remote_model_base.rb

class RemoteModelBase
  require "#{Rails.root}/lib/sqljdbc4.jar"
  self.abstract_class = true
  establish_connection "remote_#{Rails.env}".to_sym
  after_initialize :readonly!
end

app/models/dim_user.rb

class DimUser < RemoteModelBase
  self.table_name = 'DimUser'
  self.primary_key = 'dwidDimUser'

  #has_and_belongs_to_many :accounts, join_table: :accounts_dim_user
end

config/schema.rb

ActiveRecord::Schema.define(version: 20140925200106) do

  create_table "accounts", force: true do |t|
    t.string   "name"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

  create_table "accounts_dim_user", force: true, id: false do |t|
    t.integer  "dwidUser"
    t.integer  "account_id"
    t.datetime "created_at"
    t.datetime "updated_at"
  end

  # Defined in the remote database but it might look something like this
  # create_table "DimUser" do |t|
  #   t.integer dwidUser
  #   # ...
  # end
Nate
  • 12,963
  • 4
  • 59
  • 80
  • I found the gem [st-elsewhere](https://github.com/briandoll/st-elsewhere) but I'm concerned about it being 5 years old. – Nate Sep 26 '14 at 22:03

2 Answers2

3

Just ran through a quick scenario for you, here's the repo: https://github.com/beneggett/many_db_example

In the repo, I just did 2 different db's on my local, but it won't matter, the principal is the same:

This seemed to work well for me:

Tell the account about the account_dim_users join table association, but manually map the has_many through/habtm.

class Account < ActiveRecord::Base
  has_many :account_dim_users

  def dim_users
    account_dim_users.map {|account_dim_user| DimUser.find_by(dwidUser: account_dim_user.dwidUser) }
  end
end

This is important, because, as you well know, the standard join will not work; however mapping it through the model works fine.

AccountDimUser join table looks standard (I explicitly mapped the keys)

class AccountDimUser < ActiveRecord::Base
  has_many :accounts
  has_many :dim_users, primary_key: :dwidUser, foreign_key: :dwidUser

end

Manually map the account_dim_users association, and manually map the accounts association

class DimUser < ActiveRecord::Base
  establish_connection "other_db".to_sym
  after_initialize :readonly!
  self.table_name = 'DimUser'
  self.primary_key = 'dwidUser'

  def account_dim_users
    AccountDimUser.where(dwidUser: self.dwidUser)
  end

  def accounts
    account_dim_users.map {|account_dim_user| Account.find(account_dim_user.account_id) }
  end
end

This approach allows you to still use your Ruby objects in the standard way:

a = Account.first
  Account Load (0.6ms)  SELECT  "accounts".* FROM "accounts"   ORDER BY "accounts"."id" ASC LIMIT 1
=> #<Account:0x00000102d263d0> {
          :id => 1,
        :name => "New account",
  :created_at => Mon, 29 Sep 2014 15:07:07 UTC +00:00,
  :updated_at => Mon, 29 Sep 2014 15:07:07 UTC +00:00
}

--

a.account_dim_users
=> #<ActiveRecord::Associations::CollectionProxy [#<AccountDimUser id: 1, dwidUser: 1, account_id: 1, created_at: "2014-09-29 15:08:47", updated_at: "2014-09-29 15:08:47">, #<AccountDimUser id: 3, dwidUser: 5, account_id: 1, created_at: "2014-09-29 15:24:17", updated_at: "2014-09-29 15:25:06">]>

--

a.dim_users
  AccountDimUser Load (0.3ms)  SELECT "account_dim_users".* FROM "account_dim_users"  WHERE "account_dim_users"."account_id" = $1  [["account_id", 1]]
  DimUser Load (0.9ms)  SELECT  "DimUser".* FROM "DimUser"  WHERE "DimUser"."dwidUser" = 1 LIMIT 1
  DimUser Load (0.3ms)  SELECT  "DimUser".* FROM "DimUser"  WHERE "DimUser"."dwidUser" = 5 LIMIT 1
=> [
  [0] #<DimUser:0x0000010981af10> {
            :id => 1,
      :dwidUser => 1,
    :created_at => Mon, 29 Sep 2014 15:06:44 UTC +00:00,
    :updated_at => Mon, 29 Sep 2014 15:06:44 UTC +00:00
  },
  [1] #<DimUser:0x00000109838b00> {
            :id => 5,
      :dwidUser => 5,
    :created_at => Mon, 29 Sep 2014 15:23:01 UTC +00:00,
    :updated_at => Mon, 29 Sep 2014 15:23:01 UTC +00:00
  }
]

--

d = DimUser.first
  DimUser Load (0.5ms)  SELECT  "DimUser".* FROM "DimUser"   ORDER BY "DimUser"."dwidUser" ASC LIMIT 1
=> #<DimUser:0x0000010990aad8> {
          :id => 1,
    :dwidUser => 1,
  :created_at => Mon, 29 Sep 2014 15:06:44 UTC +00:00,
  :updated_at => Mon, 29 Sep 2014 15:06:44 UTC +00:00
}

--

d.account_dim_users
  AccountDimUser Load (0.5ms)  SELECT "account_dim_users".* FROM "account_dim_users"  WHERE "account_dim_users"."dwidUser" = 1
=> #<ActiveRecord::Relation [#<AccountDimUser id: 1, dwidUser: 1, account_id: 1, created_at: "2014-09-29 15:08:47", updated_at: "2014-09-29 15:08:47">]>

--

 d.accounts
  AccountDimUser Load (0.5ms)  SELECT "account_dim_users".* FROM "account_dim_users"  WHERE "account_dim_users"."dwidUser" = 1
  Account Load (0.4ms)  SELECT  "accounts".* FROM "accounts"  WHERE "accounts"."id" = $1 LIMIT 1  [["id", 1]]
=> [
  [0] #<Account:0x000001099788d0> {
            :id => 1,
          :name => "New account",
    :created_at => Mon, 29 Sep 2014 15:07:07 UTC +00:00,
    :updated_at => Mon, 29 Sep 2014 15:07:07 UTC +00:00
  }
]

There is probably some optimizations that could be done to this when dealing with large quantities of records, but it's a good base.

The other approach might be to do lookups on the association table itself, like:

  def find_dim_user
    DimUser.find_by(dwidUser: self.dwidUser)
  end

But I very much prefer the first way I suggested, as it lets you do your normal ruby method chaining approach to associations.

Any other questions, let me know!

Edit: You could change the map function to use Active Record Relations or similar as well, enabling more functionality:

class Account < ActiveRecord::Base
  has_many :account_dim_users

  def dim_users
    dim_user_ids = account_dim_users.map {|account_dim_user| account_dim_user.dwidUser }
    DimUser.where(dwidUser: dim_user_ids)
  end
end
Ben Eggett
  • 486
  • 3
  • 5
  • This looks like exactly what I was looking for. I'll try it out and come back in a little while. – Nate Sep 29 '14 at 16:19
  • 1
    This is a pretty good solution. One drawback is the cross-server associations such as `Account.first.dim_users` returns an `Array` rather than a `CollectionProxy`. This means you can't add `dim_users` by doing `a.dim_users << DimUser.find(1); a.save`. But, this is probably the best I'm going to get without creating a complex and time-consuming-to-write custom association. I might suggest optionally adding `freeze` to the returned array so modifying it throws an error. – Nate Sep 29 '14 at 19:13
  • Sure, it may also be more extensible, not to map, but rather use Active record to find the collection. See Edit at the bottom, above – Ben Eggett Sep 29 '14 at 19:48
  • You could use [`pluck`](http://apidock.com/rails/ActiveRecord/Calculations/pluck) to get the IDs instead of map. `account_dim_users.pluck(:dwidUser)`. Other than that, I think this is the answer I was looking for. You've earned the bounty. – Nate Sep 30 '14 at 13:26
  • 1
    Sure thing, just glad I could help. I've been doing a lot of non-conventional DB & schema work with Rails lately; if you run into any other non-conventional issues, shoot me a line. – Ben Eggett Oct 01 '14 at 02:50
1

Use dblink http://www.postgresql.org/docs/9.3/static/dblink.html , you will have the other table in the same database. Problem solved.

Seuros
  • 131
  • 1
  • 4
  • That is a very clever idea and would be great if the remote database was PostgreSQL. Unfortunately in my case, it is not. – Nate Sep 29 '14 at 13:08