1

Sample data:

dateHour
2021-08-01 18:00:00.000
2021-08-02 20:00:00.000
2021-08-03 06:00:00.000
2021-08-04 08:00:00.000
2021-08-05 09:00:00.000

Trying to build an aggregate table at the weekday level. So there is a need to extract full day of week name from the dateHour column. dateHour column contains date and hour value of an event in it. It is like a timestamp column.

One way of extracting the full day name is using the below case when query. This solution works but it is taking a lot of time due to case statements. This is creating a bottle neck on the performance of the query.

select 
case dayname("dateHour"::date)
            when 'Mon' then 'Monday'
            when 'Tue' then 'Tuesday'
            when 'Wed' then 'Wednesday'
            when 'Thu' then 'Thursday'
            when 'Fri' then 'Friday'
            when 'Sat' then 'Saturday'
            when 'Sun' then 'Sunday'
    end as "day_of_week"
from tableName

Another query that I have tried is:

select TO_CHAR(CURRENT_DATE, 'DYDY') Day_Full_Name;

The above query works fine when CURRENT_DATE is used but when CURRENT_DATE is replaced with the column name dateHour from the table then it is giving short week day twice. Something like this:

SunSun
MonMon
TueTue
WedWed
ThuThu

To replicate the DYDY issue check with the below code snippets:

with cte as (
select '2021-08-01 18:00:00.000'::timestamp  as "dateHour"
)
select 
    "dateHour"::date as dt,
    TO_CHAR(dt,'DYDY') day_full_name
from cte ;

The output from the above query:

DT DAY_FULL_NAME
2021-08-01 Sunday
with cte as (
select '2021-08-01 18:00:00.000'::timestamp  as "dateHour"
union all 
select '2021-08-02 20:00:00.000'::timestamp  as "dateHour"
)
select 
    "dateHour"::date as dt,
    TO_CHAR(dt,'DYDY') day_full_name
from cte 
;

Output:

DT DAY_FULL_NAME
2021-08-01 SunSun
2021-08-02 SunMon

Expected output (In the Output Sunday can be either of these: Sunday / SUNDAY )

fullDayofWeekName
SUNDAY
MONDAY
TUESDAY
WEDNESDAY
THURSDAY

I need an efficient way to generate the full week day name from the dateHour column. It shouldn't effect the performance of the aggregate query.

Teja Goud Kandula
  • 1,462
  • 13
  • 26
  • Are you sure the CASE is the reason of the poor performance? CASE is usually fast and other functions use internally also CASE or something similar. – Jonas Metzler Jul 24 '23 at 12:11
  • I couldn't come up with any other way to generate full day of week name. If there is another way to generate the same then I can be compare the query result time and be 100% sure what is causing the performance issue. But until then it is an assumption for that it is due to the case statement. – Teja Goud Kandula Jul 24 '23 at 12:14
  • 1
    What about this : https://stackoverflow.com/a/71402787/4286884 – SelVazi Jul 24 '23 at 12:36
  • Odd about that `dydy` issue. I'm not able to replicate `Select '2021-08-01 18:00:00.000'::timestamp as dt, TO_CHAR(dt, 'DYDY') Day_Full_Name;` – JNevill Jul 24 '23 at 12:46
  • 1
    I have added the code blocks in the question description on how to replicate the DYDY issue. @JNevill – Teja Goud Kandula Jul 24 '23 at 12:57
  • 1
    Crazy stuff. You drop the union and it's back to normal. That's some spooky nonsense. – JNevill Jul 24 '23 at 13:01
  • I'm not able to replicate this behavior on my account. When I run this `select TO_CHAR('2023-07-24'::timestamp, 'DYDY') Day_Full_Name;`, I get `Monday` as an output. Could this be a timestamp setting issue on your account? – Mike Walton Jul 24 '23 at 13:06
  • I used union to give some sample data set. But when the query run against the data from the table the output is giving the the Short week name twice. How can I debug the timestamp setting? Any suggestion please @MikeWalton – Teja Goud Kandula Jul 24 '23 at 13:08
  • @MikeWalton Try the code that OP added to the question with the cte. It should replicate on your instance then (and it definitely feels broken). – JNevill Jul 24 '23 at 13:10
  • The odd thing is, I don't see anywhere in the docs that states that DYDY should give you the long name. Do you have a link to that documentation? – Mike Walton Jul 24 '23 at 13:23
  • i havent seen `DYDY` in the snowflake official documents. They are using `dayname`. Instead of case-when can you use `select decode(extract ('dayofweek_iso',current_date()), 1, 'Monday', 2, 'Tuesday', 3, 'Wednesday', 4, 'Thursday', 5, 'Friday', 6, 'Saturday', 7, 'Sunday');` – Koushik Roy Jul 24 '23 at 15:10

2 Answers2

1

If performance is a problem, change the order of operations.

If you are doing large volumes of transforms date->string, then aggregation on strings, those two steps will be slower than if you aggregate on number (say dayofweek or date truncated value) and then convert to string at the in another select layer.

aka slower:

select 
     TO_CHAR(dt, 'DYDY') as day_name,
     sum(value) as sum_val
from big_table
group by day_name

verse faster:

select 
    convert_to_name_step(dow) as day_name,
    sum_val
from (
    select 
        dayofweek(dt) as dow,
        sum(value) as sum_val
    from big_table
    group by dow
)

convertion bug:

with data as (
    select 
        column1 as dt
    from values
    ('2021-08-01 18:00:00.000'::timestamp),
    ('2021-08-02 20:00:00.000'::timestamp),
    (CURRENT_TIMESTAMP)
)
select 
    dt
    ,SYSTEM$TYPEOF(dt) as t
    ,TO_CHAR(dt, 'DYDY') as n

    ,CURRENT_TIMESTAMP
    ,SYSTEM$TYPEOF(CURRENT_TIMESTAMP) as cd_t
    ,TO_CHAR(CURRENT_TIMESTAMP, 'DYDY') as cd_n

from data;

Shows it is not the data being passed to the function, but more the implementations appears to be different code paths.

enter image description here

Simeon Pilgrim
  • 22,906
  • 3
  • 32
  • 45
0

I can reproduce the glitch with the union example, but couldn't find any documentation on the use of 'DYDY' to get day names. That said, an array-based approach might be faster than traversing the if else block

select current_timestamp as ct,
       dayofweek(ct) as dow,
       ['Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday'][dow]::varchar
Radagast
  • 5,102
  • 3
  • 12
  • 27