We have a batch analytical SQL job – run once daily – that reads data from 2 source tables held in a powerful RDBMS. The source tables are huge (>100TB) but has less than 10 fields combined.
The question I have is can the 2 source tables be held in a compressed and indexed flat file so the entire operation can be much faster and saves on storage and can be run on a low spec server. Also, can we run SQL like queries against these compressed and indexed flat-files? Any pointers on how to go about doing this would be extremely helpful.