1

I have the same issue mentioned here

However, the problem is on Hive database. When I try the solution on my table that looks like

Id   Date             Column1    Column2
1    01/01/2011       5          5 => Same as Column1
2    02/01/2011       2          18 => (1 + (value of Column2 from the previous row)) * (1 + (Value of Column1 from the current row)) i.e. (1+5)*(1+2)
3    03/01/2011       3          76 => (1+18)*(1+3) = 19*4

I get the error

FAILED: SemanticException Recursive cte cteCalculation detected (cycle: ctecalculation -> cteCalculation).

What is the workaround possible in this case

Community
  • 1
  • 1
mhn
  • 2,660
  • 5
  • 31
  • 51
  • read about lead / lag here : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics If you still can't figure this out, let us know – dimamah Dec 05 '15 at 17:27
  • lead and lag is not applicable in this scenario. it's for a diffferent problem – mhn Dec 05 '15 at 20:40
  • Lead and Lag are for previous column previous row. I compute based on current column previous row. If you check the link i provided, nowhere in the solution there is lead/lag mentioned – mhn Dec 05 '15 at 20:42
  • Yea sorry for that, see my answer below – dimamah Dec 06 '15 at 23:19

2 Answers2

2

You will have to write a UDF for this.
Below you can see a very (!!) simplified UDF for what you need.
The idea is to store the value from the previous execution in a variable inside the UDF and each time return (stored_value+1)*(current_value+1) and then store it for the next line.
You need to take care of the first value to get, so there is a special case for that.
Also, you have to pass the data ordered to the function as it simply goes line by line and performs what you need without considering any order.

You have to add your jar and create a function, lets call it cum_mul.

The SQL will be :

select id,date,column1,cum_mul(column1) as column2
from
(select id,date,column1 from myTable order by id) a  

The code for the UDF :

import org.apache.hadoop.hive.ql.exec.UDF;

public class cum_mul extends UDF  {

    private int prevValue;
    private boolean first=true;

    public int evaluate(int value) {
        if (first) {
            this.prevValue = value;
            first = false;
            return value; 
        }
        else {
            this.prevValue = (this.prevValue+1)*(value+1);
            return this.prevValue;      
        }
      }
}
dimamah
  • 2,883
  • 18
  • 31
  • There're some minor issues with this answer. Global `order by` my cause data hotspot since it pushes all data to a single reducer. And for the outter qurey, data is not guaranteed to be proccessed by the same JVM process so the results might be inconstant and incorrect. – damientseng Jan 04 '20 at 11:12
0

Hive common table expression (CTE) works as a query level temp-table (a syntax sugar) that is accessible within the whole SQL.

Recursive query is not supported because it introduces multiple stages with massive I/O, which is something that the underlying execution and storage engine not good at. In fact, Hive strictly prohibit recursive references for CTEs and views. Hence the error you got.

damientseng
  • 533
  • 2
  • 19