1

I am going to migrate SHARK query into SPARK .

Below is my sample SHARK query which use function in group by clause.

select month(dt_cr) as Month,
   day(dt_cr)   as date_of_created,
   count(distinct phone_number) as total_customers        
from customer
group by month(dt_cr),day(dt_cr);

This same query not working in SPARK sql, it gives the below error;

Error : org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression not in GROUP BY.

So as a part of solution i am using below SPARK query, That is working but required code change. It is big impact on my existing project. So anyone have a better solution with minimum impact.

SELECT Month,date_of_created,count(distinct phone_number) as total_customers        
FROM
(select month(dt_cr) as Month,
    day(dt_cr)   as date_of_created,
    email
from customers)A
group by Month,date_of_created
ChrisGPT was on strike
  • 127,765
  • 105
  • 273
  • 257
sandip
  • 394
  • 1
  • 4
  • 11

1 Answers1

0

It's an issue in Spark SQL: https://issues.apache.org/jira/browse/SPARK-4296

However, I think it will be fixed in the next release. For now, you have to change your codes to bypass this issue.

zsxwing
  • 20,270
  • 4
  • 37
  • 59