-2

Are there any good online resources to learn how writing data from Spark to Vertica works? I'm trying to understand why writing to a Vertica database is slow.

This is my basic workflow:

  1. Create a SparkContext. I'm using the class pyspark.sql.SQLContext to create one.
  2. From SQLContext, using the read method to get DataFrameReader interface under 'jdbc' format.

    df = self._sqlContext.read.format('jdbc').options(url=self._jdbcURL, dbtable=subquery).load()

    Read entries from a Vertica database using jdbc connection (call it dbA)

  3. Write those entries into another Vertica database using the SparkContext in Step 1 (call it dbB)

Right now it's just a simple read from dbA and write to dbB. But writing 50 entries takes about 5 seconds.

Thanks!

OfLettersAndNumbers
  • 822
  • 1
  • 12
  • 22

1 Answers1

0

Have you tried HPE's Big Data Marketplace, specifically the HPE Vertica Connector For Apache Spark? You'll need to create an account to download the file, but there's no cost associated with creating an account. The documentation includes a Scala example of writing a Spark data frame to a Vertica table.