I am implementing big data system using apache Kudu. Preliminary requirement are as follows:
- Support Multi-tenancy
- Front end will use Apache Impala JDBC drivers to access data.
- Customers will write Spark Jobs on Kudu for analytical use cases.
Since Kudu does not support Multi tenancy OOB, I can think of a following way to support Multi tenancy.
Way:
Each table will have tenantID column and all data from all tenants will be stored in the same table with corresponding tenantID.
Map Kudu tables as an external tables in Impala. Create views for these tables with a where clause for each tenant like
CREATE VIEW IF NOT EXISTS cust1.table AS SELECT * FROM table WHERE tenantid = 'cust1';
Customer1 will access table cust1.table for accessing cust1's data using impala JDBC drivers or from Spark. Customer2 will access table cust2.table for accessing cust2's data and so on.
Questions:
- Is this an acceptable way to implement multi-tenancy or is there a better way to do it (may be with other external services)
- If implemented this way, how do I restrict customer2 from accessing cust1.table in Kudu especially when customer would write their own spark jobs for analytical purposes.