-1

I'm trying to extract the data by joining the two table, in pyspark. My join Query looks like:

SELECT COUNT(DISTINCT m.ticker),to_date(m.date) FROM extractalpha_cam2 m LEFT OUTER JOIN TOP1000 u ON u.date = to_date(m.date) GROUP BY m.date ORDER BY m.date

It is throwing the error:

Error:Py4JJavaError: An error occurred while calling z:org.apache.zeppelin.spark.ZeppelinContext.showDF

But when, i tried extracting the data from each table, it's working fine. My queries from single table are like

SELECT to_date(date) FROM extractalpha_cam2
SELECT date from TOP1000

These two queries working fine. Can anyone help me in extracting the data from both table by joining.

It would be really helpful if anyone can share any such link, which can guide me in writing the efficient queries in pyspark.

mayank agrawal
  • 2,495
  • 2
  • 13
  • 32
ggupta
  • 675
  • 1
  • 10
  • 27

1 Answers1

0

I checked and found that, this error comes when, the job you are running took more time than the time you set for timeout. In my case it was 300 seconds.

Let me know if anyone has more valuable answer than this. Thanks

ggupta
  • 675
  • 1
  • 10
  • 27