1

I have a column event in Hive table like below.

Event

Sent
Sent
Open
Open
Click
Sent
Open
Signup
Sent
Open
Click

Now I want to create new column based on the values in event column using case statement.

I want to where there is signup in event column I want the Previous_event column to be exactly the preceding value.

I have tried like below.

select event, 
       case when event = 'Sent' then 'No_event' 
            when event = 'Open' then 'Sent' 
            when event = 'Click' then 'Open'  
            else -1 
       end as Previous_event 
from table;

Result

Sent    No_event
Sent    No_event
Open    Sent
Open    Sent
Click   Open
Sent    No_event
Open    Sent
Signup  -1
Sent    No_event
Open    Sent
Click   Open

Expected result

Sent    No_event
Sent    No_event
Open    Sent
Open    Sent
Click   Open
Sent    No_event
Open    Sent
Signup  Open
Sent    No_event
Open    Sent
Click   Open

How can i achieve what I want?

Harneet Singh
  • 2,328
  • 3
  • 16
  • 23
nmr
  • 605
  • 6
  • 20
  • Do you have a PK (some id field) in this table, which determines the order of these rows. SQL in general is an un-ordered dataset. So, without any `id` defined, we cannot identify what comes first and what comes later ! – Madhur Bhaiya Sep 25 '18 at 18:50
  • You can't be using MySQL and Hive, which is it? – MatBailie Sep 25 '18 at 18:56
  • Also, the comment about a primary key: you actually just need a column to explicitly order the data by. Potentially a timestamp column? – MatBailie Sep 25 '18 at 18:57
  • @MadhurBhaiya There is no primary key in this table – nmr Sep 25 '18 at 19:08
  • @MatBailie I wanted to see if `mysql` code can work with `hive` as most of them work – nmr Sep 25 '18 at 19:08
  • @nmr any timestamp/datetime based field, which can be used to determine which event (row) "logically" comes first ? – Madhur Bhaiya Sep 25 '18 at 19:09
  • MySQL and Hive have different Syntax for all sorts of things. If you're targeting this at Hive, say you want an answer for Hive. A MySQL answer in ***not*** guaranteed to run on Hive, and Hive has functionality that MySQL doesn't have *(except in MySQL 8...)* – MatBailie Sep 25 '18 at 19:10
  • @MadhurBhaiya `eventdate` I order the records by serialnumber and eventdate – nmr Sep 25 '18 at 19:14
  • @nmr check the answer - see if it works ? It is specifically for MySQL – Madhur Bhaiya Sep 25 '18 at 19:16
  • If Hive version 0.11 or above, can use the LEAD and LAG function https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics – adarshgtalk Sep 25 '18 at 21:09

2 Answers2

2

Use a correlated subquery.

Based on OP's comments, you can try the following query (for MySQL):

select t1.event, 
       case when t1.event = 'Sent' then 'No_event' 
            when t1.event = 'Open' then 'Sent' 
            when t1.event = 'Click' then 'Open' 
            when t1.event = 'Signup' then (select t2.event 
                                           from table as t2 
                                           where t2.eventdate < t1.eventdate 
                                           order by t2.eventdate desc 
                                           limit 1)         
            else -1 
       end as Previous_event 
from table as t1;
Madhur Bhaiya
  • 28,155
  • 10
  • 49
  • 57
  • This is giving me `Unsupported SubQuery Expression Currently SubQuery expressions are only allowed as Where and Having Clause predicates` in hive Is there a different way to do this – nmr Sep 25 '18 at 21:19
  • @nmr not sure about hive. Since your question was tagged both MySQL and Hive, I gave a MySQL specific solution – Madhur Bhaiya Sep 26 '18 at 04:40
1

Below are the URLs that contain similar Problem/Solution:

Hive access previous row value

https://community.hortonworks.com/questions/39533/fill-null-with-previous-row-values-in-hive.html

The SQL will be :

select event, prev_event(event) as Previous_event from table;

The code for the UDF :

    import org.apache.hadoop.hive.ql.exec.UDF;

    public class cum_mul extends UDF  {
    private String prevValue = null;

    public String evaluate(String value) {

    switch(value) 
        { 
            case "Sent": 
        prevValue = "No_event";
                return "No_event"; 
            case "Open": 
        prevValue = "Sent";
                return "Sent";  
            case "Click": 
        prevValue = "Open";
                return "Open"; 
            default: 
                return prevValue; 
        } 
      }
}
Harneet Singh
  • 2,328
  • 3
  • 16
  • 23