How to give table name in spark-jdbc application for reading data on an RDBMS database?

Question

I am trying to read a table present on greenplum database using spark as below:

val execQuery = s"select ${allColumns}, 0 as ${flagCol} from schema.table where period_year=2017 and period_num=12"
val yearDF = spark.read.format("io.pivotal.greenplum.spark.GreenplumRelationProvider").option("url", connectionUrl).option("dbtable", s"(${execQuery}) as year2016")
                                .option("user", devUserName)
                                .option("password", devPassword)
                                .option("partitionColumn","header_id")
                                .option("lowerBound", 16550)
                                .option("upperBound", 1152921481695656862L)
                                .option("numPartitions",450).load()

When I run the code using spark-submit, I get an exception:

Exception in thread "main" org.postgresql.util.PSQLException: ERROR: relation "public.(select je_header_id,source_system_name,je_line_num,last_update" does not exist
  Position: 15
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2310)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2023)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:217)
    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:421)
    at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:318)
    at org.postgresql.jdbc.PgStatement.executeQuery(PgStatement.java:281)
    at com.zaxxer.hikari.pool.ProxyStatement.executeQuery(ProxyStatement.java:111)
    at com.zaxxer.hikari.pool.HikariProxyStatement.executeQuery(HikariProxyStatement.java)
    at io.pivotal.greenplum.spark.jdbc.Jdbc$.resolveTable(Jdbc.scala:301)
    at io.pivotal.greenplum.spark.GreenplumRelationProvider.createRelation(GreenplumRelationProvider.scala:29)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:309)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
    at com.partition.source.YearPartition$.prepareFinalDF$1(YearPartition.scala:141)
    at com.partition.source.YearPartition$.main(YearPartition.scala:164)
    at com.partition.source.YearPartition.main(YearPartition.scala)

In the execQuery I could see the schema name & table name forming properly. When I submit the code, it says public.(select je_header_id,source_system_name,) relation not found. I don't understand why is taking public as schema name & the query (select je_header_id,source_system_name,je_line_num,last_update" as the table name.

Could anyone let me know what is the mistake I am doing here & how to fix it ?

Looks like you are not using the standard Spark JDBC source, but a custom connector instead. You might get more attention in a Pivotal forum than in SO with a generic `spark` tag and generic question title -- your issue seems pretty specific... — Samson Scharfrichter, Dec 18 '18 at 20:02
But the format is same. I refered this link: https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html which is pretty much standard for all types of databases. — Metadata, Dec 19 '18 at 17:56
Possible duplicate of [How to specify subquery in the option "dbtable" in Spark-jdbc application while reading data from a table on Greenplum?](https://stackoverflow.com/questions/53894922/how-to-specify-subquery-in-the-option-dbtable-in-spark-jdbc-application-while) — 10465355, Dec 24 '18 at 12:20
I am experiencing similar problem with `postgresql` even still in `spark3.0` — WestCoastProjects, Mar 20 '20 at 21:45

score 2 · Answer 1 · answered Dec 18 '18 at 22:20

2

If you are using spark jdbc , you can wrap query and pass it to dbtable parameter. If pivotal just works like any jdbc this should work.

val query = """
  (select a.id,b,id,a.name from a left outer join b on a.id=b.id
    limit 100) foo
"""

val df = sqlContext.format("jdbc").
  option("url", "jdbc:mysql://localhost:3306/local_content").
  option("driver", "com.mysql.jdbc.Driver").
  option("useUnicode", "true").
  option("continueBatchOnError","true").
  option("useSSL", "false").
  option("user", "root").
  option("password", "").
  option("dbtable",query).
  load()

answered Dec 18 '18 at 22:20

Hari

441
5
12

If you see the code I've written, I pretty much gave it in the same way you mentioned. val execQuery = s"select ${allColumns}, 0 as ${flagCol} from schema.table where period_year=2017 and period_num=12" ----------------- option("dbtable", s"(${execQuery}) as year2016") – Metadata Dec 19 '18 at 05:06
Try replacing with a hard coded query and see if it works then we can deal with parameters – Hari Dec 19 '18 at 05:19
1

Gave it as you mentioned and still see the same exception: Exception in thread "main" org.postgresql.util.PSQLException: ERROR: syntax error at or near "select" – Metadata Dec 19 '18 at 17:46
can I use hikariDatasrouce in above? – Krutik Jayswal Jul 27 '22 at 10:37

How to give table name in spark-jdbc application for reading data on an RDBMS database?

1 Answers1

Linked

Related