2

I hope everyone is doing well!

I am trying to extract/clean data from a column. Below is an example of a cell row:

"Mother Day Pack - 200ml / 10 Cans 1 glass & 1 crisp"

What I need to do is to

Separate the text data before the hyphen (-) and extract only "Mother Day Pack"

Separate the numeric data between the hypen (-) and the slash (/) and extract only "200"

Separate the first 2-digit numeric data after the slash (/) and extract only "10"

My thoughts are to duplicate the same columns x3 then use a REGEXP_EXTRACT or REPLACE (). However, I never used it before, so I would really appreciate your help to build the 3 queries

Thanks

I tried this REGEXP_EXTRACT(product_name,'.*?/') but it only captured the 200 so I am looking for the rest. Thanks

Jayx
  • 21
  • 1

1 Answers1

0

Do this assuming your target text Mother Day Pack - 200ml / 10 Cans 1 glass & 1 crisp is in A1:

  • =REGEXEXTRACT(A1, "(.*?) -") -- returns: Mother Day Pack
    • explanation: scan and capture non-greedily everything up to -
  • =REGEXEXTRACT(A1, " -[^\d](\d+)") -- returns 200
    • explanation: scan over the first - and anything that is not a digit, and capture all digits after that
  • =REGEXEXTRACT(A1, " /[^\d](\d+)") -- returns 10
    • explanation: scan over the first / and anything that is not a digit, and capture all digits after that

This is Google Spreadsheet specific.

Edit 2023-06-23

Here is the Google Data Studio specific example:

  • REGEXEXTRACT(DIMENSION, '(.*?) -') -- returns: Mother Day Pack
  • REGEXEXTRACT(DIMENSION, '-[^\\d](\\d+)') -- returns 200 and capture all digits after that
  • REGEXEXTRACT(DIMENSION, ' /[^\\d](\\d+)') -- returns 10

Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex

Peter Thoeny
  • 7,379
  • 1
  • 10
  • 20
  • Hello @peter, thank you very much! The 3 formulas worked. I slighlty adjusted them to work on Looker's (GDS) formatting. If anyone is interested in this, these are the codes I used: `REGEXP_EXTRACT(DIMENSION, '(.*?) -')` , `REGEXP_EXTRACT(DIMENSION, '-[^\\d](\\d+)')` , `REGEXP_EXTRACT(DIMENSION,' /[^\\d](\\d+)')` Cheers – Jayx Jun 22 '23 at 08:36
  • Thanks @Jayx. I updated the answer accordingly. – Peter Thoeny Jun 23 '23 at 18:49