Are there any good online resources to learn how writing data from Spark to Vertica works? I'm trying to understand why writing to a Vertica database is slow.
This is my basic workflow:
- Create a SparkContext. I'm using the class pyspark.sql.SQLContext to create one.
From SQLContext, using the read method to get DataFrameReader interface under 'jdbc' format.
df = self._sqlContext.read.format('jdbc').options(url=self._jdbcURL, dbtable=subquery).load()
Read entries from a Vertica database using jdbc connection (call it dbA)
- Write those entries into another Vertica database using the SparkContext in Step 1 (call it dbB)
Right now it's just a simple read from dbA and write to dbB. But writing 50 entries takes about 5 seconds.
Thanks!