Graphql-ruby scoping in habtm, data leak in nested query

Question

Here's an example of a data leak that occurs if using standard graphql-ruby setup.

Using the graphql nested request below, the response returns data nested under company 1, that belongs to company 2. I expect the response to be limited to the the accountants log that belongs to the company it is nested within.

That way it is, it is leaking information.

The question is how do we patch that leak so the only data returned in the response and its nested objects is data that belongs to the company (the root object).

This query:

query { 
  company(id: "1") {
    id
    name
    activityLog {
      id
      activityAt
      accountant {
        id
        name
      }
      companyId
    }
    accountants {
      id
      name
      activityLog {
        id
        activityAt
        companyId
      }
    }
  }
}

returns this response:

{
  "data": {
    "company": {
      "id": "1",
      "name": "AwsomeCo",
      "activityLog": [
        {
          "id": "1",
          "activityAt": "2019-10-12 16:40:13 UTC",
          "accountant": {
            "id": "100",
            "name": "Mr Smith",
          },
          "companyId": "1"
        }
      ],
      "accountants": [
        {
          "id": "100",
          "name": "Mr Smith"
          "activityLog": [
            {
              "id": "1",
              "activityAt": "2019-10-12 16:40:13 UTC",
              "companyId": "1"
            },
            {
              "id": "2",
              "activityAt": "2019-10-11 16:40:13 UTC",
              "companyId": "2"  // OTHER COMPANY, NEED TO PRESERVE PARENT SCOPE
            },
          ],
        }
      }
    }
  }
}

leaking transaction log data of company 2, within the nested elements of company 1.

Again, the question is: How do we preserve scope, only displaying data in context of the company it is displaying?

Code to reproduce:

GraphQL types (using graphql-ruby gem)

#query_type.rb
module Types
  class QueryType < Types::BaseObject
    # Add root-level fields here.
    # They will be entry points for queries on your schema.
    field :company_leak, Types::ExampleLeak::CompanyType, null: false do
      argument :id, ID, required: true
    end
    field :companies_leak, [ Types::ExampleLeak::CompanyType ], null: false

    def company_leak(id: )
      Company.find(id)
    end

    def companies_leak
      Company.all
    end
  end
end

module Types
  module ExampleLeak
    class CompanyType < BaseObject
      field :id, ID, null: false
      field :name, String, null: false
      field :transaction_logs, [Types::ExampleLeak::TransactionLogType], null: true
      field :accountants, [Types::ExampleLeak::AccountantType], null: true
    end
  end
end

module Types
  module ExampleLeak
    class AccountantType < BaseObject
      field :id, ID, null: false
      field :name, String, null: false
      field :transaction_logs, [Types::ExampleLeak::TransactionLogType], null: true
      field :companies, [Types::ExampleLeak::CompanyType], null: true
    end
  end
end

module Types
  module ExampleLeak
    class TransactionLogType < BaseObject
      field :id, ID, null: false
      field :activity_at, String, null: false
      field :company_id, ID, null: false
      field :accountant, Types::ExampleLeak::AccountantType, null: false
    end
  end
end

Models

class Company < ApplicationRecord
  has_and_belongs_to_many :accountants
  has_many :transaction_logs
end

class Accountant < ApplicationRecord
  has_and_belongs_to_many :companies
  has_many :transaction_logs
end

class TransactionLog < ApplicationRecord
  belongs_to :company
  belongs_to :accountant
end

seeds.rb

awesomeco = Company.create!(name: 'AwesomeCo')
boringco = Company.create!(name: 'BoringCo') 
mr_smith = Accountant.create!(name: "Mr. Smith")
awesomeco.accountants << mr_smith
boringco.accountants << mr_smith
mr_smith.transaction_logs.create!(company: awesomeco, activity_at: 1.day.ago)
mr_smith.transaction_logs.create!(company: boringco, activity_at: 2.days.ago)

Public repo containing full code, intended as educational resource:

https://github.com/rubynor/graphql-ruby-training-ground

If I understand correctly, the issue is that you actually **don't** want the _full_ accountant information, but only the information about the accountant that relates to the specific company in the query...? — Myst, Oct 22 '19 at 13:52
Yes, I would expected to see the accountants logs of company 1, not also company 2. — oma, Oct 22 '19 at 21:35
I think you would need to refuse a company's request for an accountant's `activityLog`. Not only is the data duplicated (you have the subset of the data that you actually desire in the `company[:activityLog]`), but it cannot be curated automatically in the way you desire it to be done (if you read the accountant data you always get the ). IMHO, the best way to do this is to separate the accountant model / API from the company model / API, so the company API uses an accountant model that doesn't have an activity log associated with it. — Myst, Oct 22 '19 at 23:23
Sounds REST-ish, that's how I'm used to doing it. I thought the other approach was in the spirit of GraphQL... The use case here would be to see the log grouped by accountant pr company. — oma, Oct 23 '19 at 07:41
Q has been updated to show the essential code needed to understand the issue. It is not a shallow question, it requires deeper understanding and I'm sure a good answer would be helpful to many. My real-life issue is even more complex than this, so it is a fictional example that effectively demonstrates the issue. — oma, Oct 23 '19 at 07:55
You are right, separating the API **is** REST-ish and isn't ideal. Re-thinking my approach, I believe the ideal solution is to filter out (refuse) any requests for nested activity logs. Since the activity log for the company is already present in the company object (and the same is true when an accountant is in the root object), there's no need to allow access to any nested activity logs. This access control will improve two things: 1. security; 2. bandwidth (sending data only once). — Myst, Oct 23 '19 at 08:56
I upvoted the comment as it is valid. I do have a use case that needs it though.Yes you already have the data through company -> transaction_logs, but if you want to see the accountant logs in that company, you would have to sort that out in the frontend, or create new endpoints. @Ahmad Ali has provided a good answer that solves it instead of disallowing it (though I disallow currently in my own solution, just as you propose). — oma, Oct 23 '19 at 11:04
Consider that front end CPU cycles are cheaper then server CPU / memory / bandwidth. Computing the activity log for the nested object on the front end should improve overall performance. — Myst, Oct 23 '19 at 12:53

score 1 · Accepted Answer · answered Oct 28 '19 at 08:58

We can update field in class AccountantType < BaseObject as follows to resolve transaction logs:

  field :transaction_logs, [Types::ExampleLeak::TransactionLogType], null: true,
    resolve: ->(obj, args, ctx) {
      company_leak = ctx.irep_node.parent.parent.arguments[:id]
      companies_leak = ctx.parent.parent.object.object.id
      TransactionLog.where(id: company_leak.present? ? company_leak : companies_leak)
    }

If the company ID is given an argument, It will fetch transaction logs against that id, otherwise with respect to its parent Accountant

score 0 · Answer 2 · answered Mar 02 '20 at 19:29

0

Sounds like the perfect use case for Pundit and scoping. That way, given the user that authenticated their queries will automatically scope the correct company

answered Mar 02 '20 at 19:29

Colin Goudie

845
5
10

This is not true. Imagine using pundit and a user has access to many companies, then you get the same "leak". If you disagree, prove it with code please :) – oma Mar 04 '20 at 08:16

Graphql-ruby scoping in habtm, data leak in nested query

2 Answers2