I am facing an issue. In my case this behavior is random. After loading an hive external table and running msck repair successfully (Job 1) we have a subsequent spark job which pulls the data from these tables and loads to some other table (Job2). At random, the Job 2 is retrieving 0 records from the table loaded in Job 1. Some facts:
- We pull the data using select *
- We use spark SQL for doing this
- We run Hive on Tez engine
- We are running on AWS EMR
- The behavior is purely random and we have not been able to identify a pattern in any way
- The same table the same query gives the right results after sometime and then again returns no records at its will.
Any help in this area will be very helpful. We have been running around with no resolution.