Whenever spark reads data from hive tables, it stores it in RDD. One point i want to make clear here is hive is just a warehouse so it is like a layer which is above HDFS, when spark interacts with hive , hive provides the spark the location where the hdfs loaction exists.
Thus, Spark reads a file from HDFS, it creates a single partition for a single input split. Input split is set by the Hadoop (whatever the InputFormat used to read this file. ex: if you use textFile() it would be TextInputFormat in Hadoop, which would return you a single partition for a single block of HDFS (note:the split between partitions would be done on line split, not the exact block split), unless you have a compressed file format like Avro/parquet.
If you manually add rdd.repartition(x) it would perform a shuffle of the data from N partititons you have in rdd to x partitions you want to have, partitioning would be done on round robin basis.
If you have a 10GB uncompressed text file stored on HDFS, then with the default HDFS block size setting (256MB) it would be stored in 40blocks, which means that the RDD you read from this file would have 40partitions. When you call repartition(1000) your RDD would be marked as to be repartitioned, but in fact it would be shuffled to 1000 partitions only when you will execute an action on top of this RDD (lazy execution concept)
Now its all up to spark that how it will process the data as Spark is doing lazy evaluation , before doing the processing, spark prepare a DAG for optimal processing. One more point spark need configuration for driver memory, no of cores , no of executors etc and if the configuration is inappropriate the job will fail.
Once it prepare the DAG , then it start processing the data. So it divide your job into stages and stages into tasks. Each task will further use specific executors, shuffle , partitioning. So in your case when you do processing of bilions of records may be your configuration is not adequate for the processing. One more point when we say spark load the data in RDD/Dataframe , its managed by spark, there are option to keep the data in memory/disk/memory only etc ref -storage_spark.
Briefly,
Hive-->HDFS--->SPARK>>RDD(Storage depends as its a lazy evaluation).
you may refer the following link : Spark RDD - is partition(s) always in RAM?