2

I'm trying to extract date (20200222) from filename while copyinto command in snowflake.

File Name

s3://hadoop/2020002/dbo.DOSSIER_TRANSPORT_20200222.csv.gz

snowflake query

SELECT regexp_substr(metadata$filename, '/(_((\-|\+)?[0-9]+(\.[0-9]+)?).)/' as data FROM '@PRD.OPE_STG
                                                _CMD.SPX_PRD_CMD/' (file_format => 'OTS_TEST.OPA_STG_BENE.OTD_FORMAT', pattern => '.*dbo.DOSSIER_TRANSPORT.*') ;

I tried this regex but its not supporting in snowflake. Getting below error

100048 (2201B): Invalid regular expression: '/(_((-|+)?[0-9]+(.[0-9]+)?).)/', no argument for repetition operator: +
marjun
  • 696
  • 5
  • 17
  • 30

2 Answers2

3

Use

REGEXP_SUBSTR(metadata$filename, '_([0-9]+)[.]', 1, 1, 'c', 1)

This is the regex demo

The pattern matches _, then captures one or more digits in Group 1 and then matches .. Since the group_num argument is 1, the return value is Group 1 value.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Try also with `_([0-9]+)[.][A-Za-z0-9.]*$` if your strings might be trickier. – Wiktor Stribiżew Feb 24 '20 at 14:37
  • this solution works for if filename is like this fact_reserve_20190802.csv.gz. What if the file name is like this fact_reserve_20190803_001.csv.gz and I want to grep 8digits number between underscore and dot – marjun Feb 25 '20 at 09:58
  • @marjun Maybe `_([0-9]{8})[_.]`, see [this regex demo](https://regex101.com/r/72Zntp/4). – Wiktor Stribiżew Feb 25 '20 at 10:10
0

You need to use double escapes with Snowflake. If you just select your regexp string you will get:

SELECT '/(_((\-|\+)?[0-9]+(\.[0-9]+)?).)/';
=>   /(_((-|+)?[0-9]+(.[0-9]+)?).)/

which is exactly what the regexp function will get as an input argument as well.
With double escapes you get:

SELECT '/(_((\\-|\\+)?[0-9]+(\\.[0-9]+)?).)/';
=>   /(_((\-|\+)?[0-9]+(\.[0-9]+)?).)/

which is what I believe you want.

The error you get is from the regexp (-|+); the + operator needs a real argument...

Hans Henrik Eriksen
  • 2,690
  • 5
  • 12