How I can load csv data into hive using Spark dataframes?

Question

I am trying to load data from a csv file to Hive. I am using JAVA API of spark for doing that. I want to know how I can load data in hive using spark dataframes.
Here is what I try to make it using JSON:

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SQLContext;
public class first {
public static void main (String[] args)
{
    String inputFileName = "samples/big.txt" ;
    String outputDirName = "output" ;

    SparkConf conf = new SparkConf().setAppName("org.sparkexample.WordCount").setMaster("local");
    JavaSparkContext context = new JavaSparkContext(conf);
    @SuppressWarnings("deprecation")
    SQLContext sc = new SQLContext(context);
    DataFrame input = sc.jsonFile(inputFileName);
    input.printSchema();
}
}

But don't know how to make it using csv. I have some idea about Spark-csv provided by databricks.
Kindly let me know how I can do it.

what version of spark are you using? also is your issue reading the csv or putting the resulting dataframe to hive? — Assaf Mendelson, Feb 16 '17 at 08:09
you can use the spark-csv package to read the csv files into a `dataframe` and then use that to load it into hive table . https://github.com/databricks/spark-csv — Rajat Mishra, Feb 16 '17 at 08:18
@RajatMishra I am trying to that too. But don't understand the problem. First time working with Spar and java. Always had an interaction with Scala for spark but could not understand java. — Jaffer Wilson, Feb 16 '17 at 08:46

score 1 · Accepted Answer · answered Feb 16 '17 at 09:37

1

On spark 2.x.x csv is built in (no need for package) Try to read like this:

SparkSession spark = SparkSession
.builder()
.appName("org.sparkexample.WordCount")
.master("local[*]") .
.enableHiveSupport()
.getOrCreate();
DataFrame input = spark.read.csv(inputFileName)

You can also add options for example:

DataFrame input = spark.read.option("header","true").csv(inputFileName)

will consider the first line to be a header and give the column names accordingly

answered Feb 16 '17 at 09:37

Assaf Mendelson

12,701
5
47
56

1

Is this way I can write to hive? Actually reading was a problem now struggling with write. – Jaffer Wilson Feb 16 '17 at 09:44
you can try http://stackoverflow.com/questions/40122201/storing-a-dataframe-to-a-hive-partition-table-in-spark. I don't have hive configured so I can't check myself – Assaf Mendelson Feb 16 '17 at 10:17

How I can load csv data into hive using Spark dataframes?

1 Answers1