10

I am trying to retrieve data from oracle using spark-sql-2.4.1 version. I tried to set the JdbcOptions as below :

    .option("lowerBound", "31-MAR-02");
    .option("upperBound", "01-MAY-19");
    .option("partitionColumn", "data_date");
    .option("numPartitions", 240);

But gives error :

    java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
        at java.sql.Timestamp.valueOf(Timestamp.java:204)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:179)

Then tried as below

    .option("lowerBound", "2002-03-31"); //changed the date format
    .option("upperBound", "2019-05-02");
    .option("partitionColumn", "data_date"); 
    .option("numPartitions", 240);

Still no luck. So what is the correct way to pass the date as "lower/upperBound"? Is there a way to specify/set option parameter data Type ?

Part-2 Checked the options properly. they were overwritten in between before executing the query. So corrected it. ... now that error resolved.

But for below options:

.option("lowerBound", "2002-03-31 00:00:00"); 
.option("upperBound", "2019-05-01 23:59:59");
.option("timestampFormat", "yyyy-mm-dd hh:mm:ss");

Query String :

query ->  ( SELECT * FROM MODEL_VALS ) T

It is throwing another error :

java.sql.SQLException: ORA-12801: error signaled in parallel query server P022, instance nj0005

ORA-01861: literal does not match format string
BdEngineer
  • 2,929
  • 4
  • 49
  • 85

4 Answers4

13

If you are using Oracle, see https://github.com/apache/spark/blob/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala#L441

val df1 = spark.read.format("jdbc")
      .option("url", jdbcUrl)
      .option("dbtable", "datetimePartitionTest")
      .option("partitionColumn", "d")
      .option("lowerBound", "2018-07-06")
      .option("upperBound", "2018-07-20")
      .option("numPartitions", 3)
      // oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not disabled, column d
      // (Oracle DATE) will be resolved as Catalyst Timestamp, which will fail bound evaluation of
      // the partition column. E.g. 2018-07-06 cannot be evaluated as Timestamp, and the error
      // message says: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff].
      .option("oracle.jdbc.mapDateToTimestamp", "false")
      .option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'")
      .load()
alexm
  • 510
  • 1
  • 8
  • 19
3

I stumbled on this question as I am solving a similar problem. But in this case Spark 2.4.2 is sending date in format 'yyyy-MM-dd HH:mm:ss.ssss' to Oracle and it returned "Not a valid month" as it expects 'dd-MMM-yy HH:mm:ss.ssss'. To solve that I followed: Spark GitHub Link , it says:

Override beforeFetch method in OracleDialect to finish the following two things:

Set Oracle's NLS_TIMESTAMP_FORMAT to "YYYY-MM-DD HH24:MI:SS.FF" to match java.sql.Timestamp format. Set Oracle's NLS_DATE_FORMAT to "YYYY-MM-DD" to match java.sql.Date format.

And it solved the issue. Hope it helps.

Ankush
  • 499
  • 1
  • 7
  • 17
2

The given parameters have type timestamp, but you're providing the only date. Timestamp has format as yyyy-mm-dd hh:mm:ss, so you need to provide your dates as 2002-03-31 00:00:00 and 2019-05-01 23:59:59 correspondingly...

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • thank you so much Alex you are always with helping hand... one more thing when i see the source code of v2.4.1 i see it as private def toInternalBoundValue(value: String, columnType: DataType): Long = columnType match { case _: NumericType => value.toLong case DateType => DateTimeUtils.fromJavaDate(Date.valueOf(value)).toLong case TimestampType => DateTimeUtils.fromJavaTimestamp(Timestamp.valueOf(value)) } So why it is not considering it as "DateType" ? – BdEngineer May 03 '19 at 09:25
  • Changed it as .option("lowerBound", "2002-03-31 00:00:00" ); .option("upperBound", "2019-05-01 23:59:59"); But still getting same error : java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff] – BdEngineer May 03 '19 at 09:31
  • 1
    just checked, worked just fine with these dates. Can you remove the `;` between the calls to `option` ? – Alex Ott May 03 '19 at 10:19
  • Sorry i am using java code ... so ";" is must. May I know how/where are you testing it ? – BdEngineer May 03 '19 at 10:42
  • please check this https://gist.github.com/shatestest/34c6f7fda3e24499080b6d03e2bbb45d – BdEngineer May 03 '19 at 11:03
  • let me know if any confusion in understanding my code. – BdEngineer May 03 '19 at 11:04
  • please check this gist.github.com/shatestest/34c6f7fda3e24499080b6d03e2bbb45d – BdEngineer May 03 '19 at 12:40
  • updated with SampleSourceData ... https://gist.github.com/shatestest/34c6f7fda3e24499080b6d03e2bbb45d – BdEngineer May 03 '19 at 13:11
  • 1
    I suspect that somehow options aren't persisted in data frame... For me `Timestamp.valueOf("2019-05-01 23:59:59");` works just fine - returning timestamp – Alex Ott May 03 '19 at 13:22
  • when i debug I see options set to DataFrameReader perfectly fine. Map(driver -> oracle.jdbc.OracleDriver, numPartitions -> 2000, timestampFormat -> yyyy-mm-dd hh:mm:ss, upperBound -> 2019-05-01 23:59:59.0, lowerBound -> 2002-03-31 00:00:00.0, partitionColumn -> DATA_DATE, url -> jdbc:oracle:thin:@, user -> user1, password -> pwd1, customSchema -> DATA_DATE TIMESTAMP, DATA_ITEM_VALUE_NUMERIC DECIMAL(38, 15) , MODEL_FAMILY_ID INTEGER , FISCAL_YEAR INTEGER , FISCAL_QUARTER INTEGER, fetchsize -> 10000) – BdEngineer May 03 '19 at 13:31
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/192780/discussion-between-shyam-and-alex-ott). – BdEngineer May 03 '19 at 13:33
  • @ user10938362 any suggestion to fix above error i.e. Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff] – BdEngineer May 06 '19 at 07:41
1

all of the following options must be set in this way in order for it to work:

spark.read
      .option("header", true)
      .option("inferSchema", true)
      .option("timestampFormat", "MM/dd/yyyy h:mm:ss a")
      .csv("PATH_TO_CSV")
raam86
  • 6,785
  • 2
  • 31
  • 46
  • @Alex Ott , yes earlier error was about the options overwriting as you mentioned. Now i fixed , but now getting "ORA-01861: literal does not match format string" . request to have a look at "part 2" of question – BdEngineer May 06 '19 at 12:26
  • 1
    Unfortunately I did use Oracle ~13 years ago... But I remember that they had very detailed manuals on every error code – Alex Ott May 06 '19 at 12:42