How to cast a decimal value into String in spark while reading data from Greenplum?

Question

I am trying to read an RDBMS table on Greenplum database using spark. I have the following columns:

val allColumnsSeq: Seq[String] = Seq("usd_exchange_rate", "usd_exchange_rate::character varying as usd_exchange_rate_text")

I am trying to read above columns in spark as:

val yearDF = spark.read.format("io.pivotal.greenplum.spark.GreenplumRelationProvider").option("url", connectionUrl)
.option("dbtable", "x_lines")
.option("dbschema","copydb")
.option("user", devUserName).option("password", devPassword)
.option("partitionColumn","id")
.load()
.where("year=2017 and month=12")
.select(allColumnsSeq map col:_*)
.withColumn(flagCol, lit(0))

There are certain columns in gp that are of datatype: decimal which contain precision digits. In above table, it is:

usd_exchange_rate

It contains nearly 45 digits of precision. In our project is, we keep the original column(usd_exchange_rate) and we create a new column from usd_exchange_rate in character datatype & its column name appended with _text. In this case,

decimal datatype: usd_exchange_rate & same column in char datatype: usd_exchange_rate_text

When I execute the above line, I get the exception:

org.apache.spark.sql.AnalysisException: cannot resolve '`usd_exchange_rate::character varying as usd_exchange_rate_text`'

I see that that I am casting it in a wrong format but I don't understand how can I read the same column in decimal & text format in one step. Could anyone let me know if there is a way to achieve it in spark ?

Given [limitations of connector](https://stackoverflow.com/q/53894922/10465355) it looks like you will have create a view in the source, and read from there (unless of course you want to switch to built-in JDBC source). — 10465355, Dec 24 '18 at 12:28

Rohit Nimmala · Answer 1 · 2018-12-24T20:10:25.057

Not Sure about the error, but to cast did you try defining custom schema? Assuming that you know your schema already, Define your own custom schema with StructType.

import org.apache.spark.sql.types._

val customSchema = StructType(Seq(
StructField("usd_exchange_rate",StringType,true),
StructField("aud_exchange_rate",StringType,true),
.
.
.
StructField("<some field>",<data type>,<Boolean for nullable>)
))

val yearDF = spark.read.format("io.pivotal.greenplum.spark.GreenplumRelationProvider").option("url", connectionUrl)
    .option("dbtable", "x_lines")
    .option("dbschema","copydb")
    .option("user", devUserName).option("password", devPassword)
    .option("partitionColumn","id")
    .schema(customSchema)
    .load()
    .where("year=2017 and month=12")
    .select(allColumnsSeq map col:_*)
    .withColumn(flagCol, lit(0))

I didn't test this in IDE but it should work.

This can give me 'StructField("usd_exchange_rate",StringType,true)' the column in String data type. But I should also have the original column and its datatype. Like -> one with the same column name & datatype: StructField("usd_exchange_rate",DecimalType,true), other as StructField("usd_exchange_rate_text",StringType,true) How can I give that ? — Metadata, Dec 25 '18 at 11:05

How to cast a decimal value into String in spark while reading data from Greenplum?

1 Answers1