We have around 10 TB of data from the customer which have to load and query using hive and create aggregation tables which again has to be queried multiple times.
I am planning to use AWS S3
to store 10 TB data in one bucket and query the data using EMR
.
Is it a feasible approach or the performance will be poor?
What alternatives can be used to speed up the query?