0

I have year week column , which is like below format, in pysaprk sql in databricks.

input column

Year_week 
202001
202002
202003
202004
202005
202006
202007

Expected output column

  Year_week    Month
    202001     01
    202002     01
    202003     01
    202004     01
    202005     02
    202006     02
    202007     02
blackbishop
  • 30,945
  • 11
  • 55
  • 76
Jha Ayush
  • 67
  • 8
  • You need a table / file to define this. There is no single definition of what month a numeric week falls in. It depends on what business this analysis is for. For example, for an Australian financial year, week 1 is in July. – Nick.Mc Jul 15 '22 at 06:54
  • i wanted to use it for the general purpose. l saw some examples like SELECT extract(MONTH FROM INTERVAL '2021-11' YEAR TO MONTH); Here month is extracted but my format doesnt have '-' . – Jha Ayush Jul 15 '22 at 07:23
  • I suggest you do some research on how to change yyyyww to yyyy-ww – Nick.Mc Jul 17 '22 at 12:02
  • Well, that was wrong format so instead i took week date or month date directly from date. Other way around was adding '-' in year and week number then we can use it to extract. – Jha Ayush Jul 19 '22 at 05:20

1 Answers1

1

You can try this:

SELECT  Year_week, 
        month(
            to_date(substring(Year_week, 1, 4), 'yyyy') +
            cast(concat('interval ', substring(Year_week, 5, 2), ' week') as interval)
        ) as Month
FROM df

Gives:

# +---------+-----+
# |Year_week|Month|
# +---------+-----+
# |   202001|    1|
# |   202002|    1|
# |   202003|    1|
# |   202004|    1|
# |   202005|    2|
# |   202006|    2|
# |   202007|    2|
# +---------+-----+
blackbishop
  • 30,945
  • 11
  • 55
  • 76
  • 1
    `01` in `202001` is the **week number**, not the month. Parsing it as month with `yyyyMM` does not produce the result the OP is expecting. – Hristo Iliev Jul 15 '22 at 08:18
  • your query take week number as month which is incorrect data extraction. it is yyyyww, yearweek . – Jha Ayush Jul 15 '22 at 09:15