When running the same job in local (IntelliJ IDEA) the output counts are fine (For eg -55). But When submitted it on Yarn using spark-submit, Getting only few rows out of it (Rows -12).
spark2-submit --master yarn --deploy-mode client --num-executors 5 --executor-memory 5G --executor-cores 5 --driver-memory 8G --class com.test.Main --packages com.crealytics:spark-excel_2.11:0.13.1 --driver-class-path /test/ImpalaJDBC41.jar,/test/TCLIServiceClient.jar --jars /test/ImpalaJDBC41.jar,/test/TCLIServiceClient.jar /test/test-1.0-SNAPSHOT.jar
when use master - yarn getting partial rows. And
when use local - Able to read all rows but got Exception as - Caused by: java.sql.SQLFeatureNotSupportedException: [Simba][JDBC](10220) Driver not capable.
Seems like it is not able to read all the block from HDFS when running on cluster.
Any help will be much appreciated. Thanks