2

Supposed I have some sample data in table_name_a as below:

    code     val_a   date
   -------------------------
1   00001    500    20191101
2   00001    1000   20191130
3   00002    200    20191101
4   00002    400    20191130
5   00003    200    20191101
6   00003    600    20191130

There are some val_a of code between 20191101 and 20191130, I would like to get the last day value of the month on every code, and my SQL query is as below(need to match Hive and Impla):

SELECT code, max(date) AS date, val_a
    FROM table_a
    WHERE date BETWEEN '20090601'
    AND '20090630'
GROUP BY code, val_a

But above query was wrong(the val_a of code is not the last day of the month),My expected output as below:

    code    val_a      date
   --------------------------
1   00001    1000   20191130
2   00002    400    20191130
3   00003    600    20191130

Thanks so much for any advice.

5 Answers5

1

We could try using a ROW_NUMBER solution here:

WITH cte AS (
    SELECT t.*, ROW_NUMBER() OVER (PARTITION BY code ORDER BY date DESC) rn
    FROM table_a
    -- WHERE date BETWEEN '20090601' AND '20090630'
    -- your current WHERE clause is dubious
)

SELECT code, date, val_a
FROM cte
WHERE rn = 1;

Note that it is not best practice to be storing dates as text. That being said, given that you are storing your dates in an ISO format with fixed width, we can still work with these dates in this case. Also, your current WHERE clause does not make sense, so I commented it out.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thank you so much, it works, you really saved my day!!!!☕️☕️ –  Dec 31 '19 at 03:27
1

You can try the following code. In the subquery, you get the max date along with the code. The WHERE IN clause is used as filter to your data.

SELECT code, val_a, date
FROM table_a
WHERE (code, date) IN 
  (SELECT code, MAX(date) 
  FROM table_a
  GROUP BY code)
JoPapou13
  • 753
  • 3
  • 8
  • Thanks for the answer, it's a pity, it doesn't work in Impala, thank you all the same. –  Dec 31 '19 at 01:46
0

In more general way you can use correlected subquery :

select a.*
from table_a a
where a.date = (select max(a1.date) from table_a a1 where a1.code = a.code);
Yogesh Sharma
  • 49,870
  • 5
  • 26
  • 52
  • Hi, thanks for the answer, if you use`max(a1.date)`, I think it needs `group by`, it doesn't work, but thank you all the same. –  Dec 31 '19 at 01:44
0

Use row_number:

 with your_data as (
    select stack(6, 
'00001',500 ,'20191101',
'00001',1000,'20191130',
'00002',200 ,'20191101',
'00002',400 ,'20191130',
'00003',200 ,'20191101',
'00003',600 ,'20191130' ) as (code,val_a,date)
)

select  code,val_a,date
  from
(  
 select code,val_a,date, 
        --partition by code and months, max date first
        row_number() over(partition by code, substr(date, 1,6) order by date desc) rn        
   from your_data d 
)s where rn=1   
;

Result:

OK
code    val_a   date
00001   1000    20191130
00002   400     20191130
00003   600     20191130
Time taken: 54.641 seconds, Fetched: 3 row(s)
leftjoin
  • 36,950
  • 8
  • 57
  • 116
-1

If you need the data of only last day of the month then you can use LAST_DAY and TRUNC function on the date in WHERE clause as follows:

SELECT
    CODE,
    DATE AS "DATE", -- removed MAX
    VAL_A
FROM
    TABLE_A
WHERE
    DATE BETWEEN '20090601' AND '20090630'
    AND TRUNC(LAST_DAY(MAX(DATE))) = TRUNC(DATE); -- added this condition
    -- removed the GROUP BY clause

Cheers!!

Popeye
  • 35,427
  • 4
  • 10
  • 31
  • I got an error of `AnalysisException: No matching function with signature: trunc(TIMESTAMP).`, thank you all the same for the answer –  Dec 31 '19 at 01:50