1

I am trying to read an RDBMS table on Greenplum database using spark. I have the following columns:

val allColumnsSeq: Seq[String] = Seq("usd_exchange_rate", "usd_exchange_rate::character varying as usd_exchange_rate_text")

I am trying to read above columns in spark as:

val yearDF = spark.read.format("io.pivotal.greenplum.spark.GreenplumRelationProvider").option("url", connectionUrl)
.option("dbtable", "x_lines")
.option("dbschema","copydb")
.option("user", devUserName).option("password", devPassword)
.option("partitionColumn","id")
.load()
.where("year=2017 and month=12")
.select(allColumnsSeq map col:_*)
.withColumn(flagCol, lit(0))

There are certain columns in gp that are of datatype: decimal which contain precision digits. In above table, it is:

usd_exchange_rate

It contains nearly 45 digits of precision. In our project is, we keep the original column(usd_exchange_rate) and we create a new column from usd_exchange_rate in character datatype & its column name appended with _text. In this case,

decimal datatype: usd_exchange_rate & same column in char datatype: usd_exchange_rate_text

When I execute the above line, I get the exception:

org.apache.spark.sql.AnalysisException: cannot resolve '`usd_exchange_rate::character varying as usd_exchange_rate_text`'

I see that that I am casting it in a wrong format but I don't understand how can I read the same column in decimal & text format in one step. Could anyone let me know if there is a way to achieve it in spark ?

Metadata
  • 2,127
  • 9
  • 56
  • 127
  • Given [limitations of connector](https://stackoverflow.com/q/53894922/10465355) it looks like you will have create a view in the source, and read from there (unless of course you want to switch to built-in JDBC source). – 10465355 Dec 24 '18 at 12:28

1 Answers1

0

Not Sure about the error, but to cast did you try defining custom schema? Assuming that you know your schema already, Define your own custom schema with StructType.

import org.apache.spark.sql.types._

val customSchema = StructType(Seq(
StructField("usd_exchange_rate",StringType,true),
StructField("aud_exchange_rate",StringType,true),
.
.
.
StructField("<some field>",<data type>,<Boolean for nullable>)
))

val yearDF = spark.read.format("io.pivotal.greenplum.spark.GreenplumRelationProvider").option("url", connectionUrl)
    .option("dbtable", "x_lines")
    .option("dbschema","copydb")
    .option("user", devUserName).option("password", devPassword)
    .option("partitionColumn","id")
    .schema(customSchema)
    .load()
    .where("year=2017 and month=12")
    .select(allColumnsSeq map col:_*)
    .withColumn(flagCol, lit(0))

I didn't test this in IDE but it should work.

Rohit Nimmala
  • 1,459
  • 10
  • 28
  • This can give me 'StructField("usd_exchange_rate",StringType,true)' the column in String data type. But I should also have the original column and its datatype. Like -> one with the same column name & datatype: StructField("usd_exchange_rate",DecimalType,true), other as StructField("usd_exchange_rate_text",StringType,true) How can I give that ? – Metadata Dec 25 '18 at 11:05