2

I am using spark-redshift and querying redshift data using pyspark for processing.

The query works fine if i run on redshift using workbench etc.But spark-redshift unloads data to s3 and then retrieves it and it is throwing the following error when i run it.

py4j.protocol.Py4JJavaError: An error occurred while calling o124.save.
: java.sql.SQLException: [Amazon](500310) Invalid operation: Assert
Details: 
 -----------------------------------------------
  error:  Assert
  code:      1000
  context:   !AmLeaderProcess - 
  query:     583860
  location:  scheduler.cpp:642
  process:   padbmaster [pid=31521]
  -----------------------------------------------;
    at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(ErrorResponse.java:1830)
    at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(PGMessagingContext.java:822)
    at com.amazon.redshift.client.PGMessagingContext.handleMessage(PGMessagingContext.java:647)
    at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(InboundMessagesPipeline.java:312)
    at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(PGMessagingContext.java:1080)
    at com.amazon.redshift.client.PGMessagingContext.getErrorResponse(PGMessagingContext.java:1048)
    at com.amazon.redshift.client.PGClient.handleErrorsScenario2ForPrepareExecution(PGClient.java:2524)
    at com.amazon.redshift.client.PGClient.handleErrorsPrepareExecute(PGClient.java:2465)
    at com.amazon.redshift.client.PGClient.executePreparedStatement(PGClient.java:1420)
    at com.amazon.redshift.dataengine.PGQueryExecutor.executePreparedStatement(PGQueryExecutor.java:370)
    at com.amazon.redshift.dataengine.PGQueryExecutor.execute(PGQueryExecutor.java:245)
    at com.amazon.jdbc.common.SPreparedStatement.executeWithParams(Unknown Source)
    at com.amazon.jdbc.common.SPreparedStatement.execute(Unknown Source)
    at com.databricks.spark.redshift.JDBCWrapper$$anonfun$executeInterruptibly$1.apply(RedshiftJDBCWrapper.scala:108)
    at com.databricks.spark.redshift.JDBCWrapper$$anonfun$executeInterruptibly$1.apply(RedshiftJDBCWrapper.scala:108)
    at com.databricks.spark.redshift.JDBCWrapper$$anonfun$2.apply(RedshiftJDBCWrapper.scala:126)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Caused by: com.amazon.support.exceptions.ErrorException: [Amazon](500310) Invalid operation: Assert

The query which gets generated:

UNLOAD ('SELECT “x”,”y" FROM (select x,y from table_name where 
((load_date=20171226 and hour>=16) or (load_date between 20171227 and 
20171226) or (load_date=20171227 and hour<=16))) ') TO ‘s3:s3path' WITH 
CREDENTIALS ‘aws_access_key_id=xxx;aws_secret_access_key=yyy' ESCAPE 
MANIFEST

What is the issue here and how can i resolve this.

adithya krishnan
  • 33
  • 1
  • 1
  • 5
  • Have you tried to simplify the query? you don't need that wrapper with capital letters. Assert error usually happens when something is wrong with interpreting data types, for example for 2 parts of `union` query where column N in one part is varchar and in another part the same column is integer or null. Maybe it's assertion error for data that comes from different nodes. – AlexYes Dec 27 '17 at 14:12
  • Actually the query i use is only the inner part..the outer part(wrapper) gets generated as it has to unload to s3.i guess its from spark-redshift. – adithya krishnan Dec 28 '17 at 14:13
  • what if you use the complete generated query in workbench? does it return the same error? – AlexYes Dec 28 '17 at 15:55
  • With the unload statement yes it generates the same assert error.But only the query it executes fine. – adithya krishnan Dec 29 '17 at 06:37
  • try to add explicit data formatting for each column like `x::integer` – AlexYes Dec 29 '17 at 10:39
  • Thanks a lot it worked after adding it. – adithya krishnan Jan 03 '18 at 11:45
  • I've posted the answer, please accept it. It's a relatively common error so it's good to have the accepted answer. – AlexYes Jan 03 '18 at 23:11

1 Answers1

2

Assert error usually happens when something is wrong with interpreting data types, for example for 2 parts of union query where column N in one part is varchar and in another part the same column is integer or null. Maybe your assertion error happens for data that comes from different nodes (just like in union query). Try to add explicit data formatting for each column like x::integer

AlexYes
  • 4,088
  • 2
  • 15
  • 23