1

Can someone help me fetch "Client name" using regexp_substr from below path in Snowflake. The client names here can have spaces or underscores

s3://student-com/student-to-employee/student1/Student_Party/20221223/Client ABC/Employee_1st_data_20221223115427_4048241.csv

s3://student-com/student-to-employee/student1/Student_Party/20221223/Client_XYZ/Employee_1st_data_20221223115427_4048241.csv

I used something like this:

select regexp_substr(METADATA$FILENAME, '(.+\/)*(.+.csv)$', 1,1,'e') as name

Sara
  • 33
  • 5

2 Answers2

1

You can use

select regexp_substr(METADATA$FILENAME, '([^/]+)/[^/]*\.csv$', 1,1,'e') as name

See the regex demo.

Details:

  • ([^/]+) - Group 1: any one or more chars other than /
  • / - a / char
  • [^/]* - zero or more chars other than /
  • \.csv - .csv
  • $ - end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Appreciate your answer, however this gives Employee_1st_data as output, I want to see Client_XYZ or Client ABC as output. – Sara Dec 30 '22 at 21:42
  • @Sara I was trying to follow the original regex logic. I replaced the solution to the one that returns the strings you mentioned. – Wiktor Stribiżew Dec 30 '22 at 21:59
1

If client name is always the subfolder preceding the file name, you can use split_part

select split_part(col,'/',-2)
Radagast
  • 5,102
  • 3
  • 12
  • 27