2

In a nutshell: What is the best way to give and control end user access to files stored in a S3 bucket with specific access rules determined for each files by which “group” the end user belong to and what is his role in that “group”, when there is a lot of dynamically defined “group” (more than 100 000) and each user can be part of several “groups” (more than 1000).

I am in a team where we are developing a product based on AWS lambda, accessible with a web app. The product is developed using micro service architecture. To explain our use case, let's imagine we have 3 micro services:

  • User service, that is in fact AWS Cognito (handle user and authorization in the whole platform)
  • Company service. Developed by us, based on AWS Lambda and dynamoDB. That manage company information (name, people, and other metadata that I will not explain here)
  • Document service. This service, that we need to develop, need to handle documents that belongs to a company.

In terms of architecture, we have some difficulty to handle the following use case:

We would like that people that belong to one or multiple companies can have access to that documents (files). These people may have some role inside the company (Executive, HR, Sales). Depending of these roles, people may have access to only a subpart of company documents. Of course, people that do not belong to a company will not have access to that company documents.

To handle such use cases, we would like to use AWS S3, and if possible, without redeveloping our own micro service that may proxify AWS S3.

The problem is: How we can manage rights with AWS S3 for our use case ?

We have investigated multiple solutions.

  1. Using IAM policies that restrict S3 file access (the WEB app access S3 directly, no proxy). If our S3 bucket is organized by company name/UUID (folders at the root dir of S3), we can think about creating an IAM policy every time we create a company and configure it so that every user in a company have access to the company folder, and only that folder.

  2. Create a bucket for each company is not possible because AWS limit the number of S3 bucket to 100 (or 1000) per AWS account. And our product may have more than 1000 companies

  3. Putting user in group (group == 1 company) is not possible because the number of groups per user pool is 500.

  4. Using lamda@edge that proxify AWS S3 call to verify that file URI in S3 is authorized for the requested user (user belongs to the company and have the right roles to read its documents). This Lambda@edge will call an internal service to know if this user is authorized to get files from this company (based on the called URL)

  5. Using AWS S3 Pre Signed URL. We can create our own document-service, that expose CREATE, GET, DELETE api, that will contact AWS S3 service after having done authorization checking (user belongs to the company) and generate pre signed URL to upload or get a file. Then the user (WebApp) will call S3 directly.

In fact, If I try to summarize our problem, we have some difficulties to handle a mix of RBAC and authorization control inside an AWS product developed with AWS lambda, and exposing AWS S3 to end user.

If you have experience or recommendation for this kind of use case, you advice will be very welcome.

Joris.B
  • 101
  • 6

3 Answers3

3

I am answering my question to expose to you our final decision.

We have chosen the solution based on presigned URL, that will let us:

  • being independent to AWS S3 (it is possible to change from S3 to another file storage service without too many cost)
  • not exposing S3 API to our client (web application), but just URL where the webapp can do native upload or download files
  • right management is done inside the service itself (doc-service), that will generate pre signed URL after the authorization is done
  • information to do rights management come from Cognito (authentication) and company service (authorization)

Bellow, an architecture diagram that expose this, based on AWS lambda:

Architecture diagram

Joris.B
  • 101
  • 6
1

I'd consider using STS to generate temporary credentials for a certain role and policy (which can be defined dynamically). So basically it's more or less your number 1 except that you don't have to pre-create all these policies, you can construct them dynamically.

Something along the lines:

AWSSecurityTokenService client = AWSSecurityTokenServiceClientBuilder.standard().build();
AssumeRoleRequest request = new AssumeRoleRequest()
    .withRoleArn("arn:aws:iam::123456789012:role/sales")
    .withRoleSessionName("Scott Tiger")
    .withPolicy("{\"Version\":\"2012-10-17\"," +
       "\"Statement\":[{\"Sid\":\"Stmt1\",\"Effect\":\"Allow\",\"Action\":\"s3:GetObject\"," +
       "\"Resource\":\"arn:aws:s3:::document_storage_bucket/" + company + "/" + department + "/*\"}]}");
AssumeRoleResult response = client.assumeRole(request);

(Sorry for the line breaks.)

This will give you credentials with permissions that are the intersection of the role's identity-based policy and the session policies.

You can then pass these credentials to the user, generate presigned URLs, whatever you need.

lexicore
  • 42,748
  • 17
  • 132
  • 221
  • Thank you @lexicore for your answer. We will investigate this solution. In your example, you mention a Role, but I think this is difficult to use it for S3 permissions, because attached policy must be dynamic too. For example, the role _sales_ only have access to `/company_name/sales/*`. But I think the dynamic policy defined in STS could be enough for us. – Joris.B Aug 29 '19 at 09:13
  • @Joris.B role is not too important there, you can use just one standard role for everything or a just few roles if there's a functional equivalent for these. – lexicore Aug 29 '19 at 13:02
0

As to me, I would go with the 5th solution :

1 - This will allow you to manage your rights exactly the way you design it, without too many constraints. You will also absorb easily any change on your authorization rules.

2 - The document download feature is thus not completely coupled with S3. If you want later to go for an other implementation (EDM, dynamic generation, ...) you can manage that from your gateway, and even use several systems at the same time.

Pierre Sevrain
  • 699
  • 4
  • 14