1

We would like to implement a multi-tenant solution for SCDF for which each tenant may have unique task definitions / etc. Ideally we only want a single SCDF server (as opposed to setting up an SCDF server for each tenant), as pictured: multi-tenant SCDF

Is this possible or is the only way to achieve isolation of the data between tenants to have separate data flow server instances?

GaZ
  • 2,346
  • 23
  • 46

1 Answers1

1

What you're attempting here is not possible today. You'd have to provision SCDF for each tenant. In cloud platforms like Kubernetes or Cloud Foundry, it is recommended because you can access-control the tenants through "namespace" and "org/space" isolation respectively. On this foundation, the platforms provide a more robust separation through RBAC assignments for each user in the Tenant.

A little bit of more background as to why we do this today. SCDF and the Task/Job repositories are coupled in the sense that the Dashboard and the other client tools interact with the same datasource to provide the consistent UX to monitor and manage the data pipelines centrally. With the recent multi-platform backends support for Tasks, you're still expected to use a common datasource in the current design.

All that said, we are looking into improving to allow users to have a database with schemas prefixed with an identifier [see: spring-cloud/spring-cloud-dataflow#2048]. With that in place, it would be possible to then filter by the identifier-specific task/job executions and likewise track them as isolated units of operations within the single SCDF instance.

However, it may not scale for cloud deployments. Each of the tenant isolation boundaries, for instance, a "namespace" in Kubernetes needs to have enough resources (cpu/memory/disk) to handle "multiple" tenant deployments of task/batch apps. If you don't autoscale the resource capacity, you'd have deployment failures.

Maybe you could help with describing your requirements in some more detail, so we could relate to why this could still be useful. Please also share how you're going to design the resource allocations in the underlying deployment platform - feel free to comment in #2048.

Sabby Anandan
  • 5,636
  • 2
  • 12
  • 21
  • Thanks for the info, Sabby. We will reconsider the architecture and consider the pros and cons of a single repository vs multiple SCDFs (one per tenant). I don't think that a custom table prefix feature mentioned in 2048 would be helpful for our case. The primary requirement / functionality we're looking for is the ability to launch Tasks as pods. We will have a custom GUI for launching the tasks, so I guess it should be possible for SCDF to have a single, separate datasource from the tenants in the diagram above. – GaZ Mar 27 '19 at 20:19
  • Thanks for the clarification. The tenant in your case can be mapped to a platform-account in SCDF, which is nothing but a tenant differentiated by something unique. Let's say a tenant is distinguished by "namespace" in Kubernetes, and that can be a set of configuration properties [see: [2852#issuecomment-462957754](https://github.com/spring-cloud/spring-cloud-dataflow/issues/2852#issuecomment-462957754)]. With multiple platform configuration in SCDF, you can select and launch the Task against them - this is supported today. – Sabby Anandan Mar 27 '19 at 21:21
  • However, it doesn't provide the mechanism to connect each platform against a different database. A common datasource is a requirement today. If we add support for it, SCDF should then be able to query based on the platform identifier to derive the corresponding datasource to launch and retrieve Task executions. – Sabby Anandan Mar 27 '19 at 21:23