How get name before first uppercase word excluding T-shirt

Question

Products have names like

Product one white Adidas
Other product black Hill sheet
Nice T-shirt blue Brower company

How to get starting part of product name before first uppercase word starting from second word and up to first uppercase word excluding word T-shirt. Result from strings above should be

Product one white
Other product black
Nice T-shirt blue

Using Bohemian answer to question

Substring before first uppecase word excluding first word

regexp_replace('Nice T-shirt blue Brower company', '(?<!^)\m[A-ZÕÄÖÜŠŽ].*', '')

returns wrong result

Nice

How to modify regex so that it returns

Nice T-shirt blue

Using Postgres 12

score 1 · Answer 1 · answered Dec 06 '22 at 21:53

1

This regex works for your test cases:

^[A-Z][a-z ]*(T-shirt)?[a-z ]*

Explanation:

^: Start of line

[A-Z]: Any capital letter

[a-z ]*: zero or more characters that are either a lowercase letter or space

(T-shirt)?: The phrase T-shirt 0 or 1 times

[a-z ]*: same again

answered Dec 06 '22 at 21:53

dc-ddfe

487
1
11

I added national letters to regex: ^[A-ZÕÄÖÜŠŽ][a-zõäöüšž ]*(T-shirt)?[a-zõäöüšž ]* . Is this ok? Words may contain also some other characters like - / and + . Should all possible characters listed separately in regex? – Andrus Dec 06 '22 at 22:15
1

Please test on your data, and make another comment if it doesn't work. It's difficult to fix issues when 1, I don't know if it's a problem in the first place, and 2, I don't have sample inputs to check. – dc-ddfe Dec 06 '22 at 22:28
regexp_replace('Shoes Attalla', '^[A-Z][a-z ]*(T-shirt)?[a-z ]*', '') returns **Attalla** but should return Shoes. It looks like this is totally wrong. – Andrus Dec 07 '22 at 22:29
1

That's because you haven't applied the `regexp_replace` correctly. The regex I've provided selects the part of the next you want to keep, not the text to be removed. To fix that, run `regexp_replace('Shoes Attalla', '^([A-Z][a-z ]*(T-shirt)?[a-z ]*).*$', '$1')`. (To briefly explain, capture the part you need to keep, and use `.*$` to consume the rest of the line. Then just replace with the first capture group) – dc-ddfe Dec 07 '22 at 23:13

score 1 · Accepted Answer · answered Dec 07 '22 at 05:29

Use a negative look ahead:

select regexp_replace('Nice T-shirt blue Brower company', '(?<!^)\m(?!T-shirt)[A-ZÕÄÖÜŠŽ].*', '')

See live demo.

(?!T-shirt) means the following characters must not be 'T-shirt'

You can add other capitalised terms to ignore:

(?!T-shirt|BB gun|BBQ tongs)

Ajax1234 · Answer 3 · 2022-12-06T22:51:27.237

0

Instead of doing a direct match, you can simply remove the part of the string (in this case from the last uppercase word to the end) that you do not need:

select regexp_replace(name, '(?<=\s)[A-Z]+[a-zA-Z\s]+$', '') from tbl

See fiddle.

edited Dec 06 '22 at 22:51

answered Dec 06 '22 at 22:04

Ajax1234

69,937
8
61
102

This returns wrong result if there are no oper uppercase words. For Nice T-shirt blue brower company it returns Nice but correct result is whole string Nice T-shirt blue brower company – Andrus Dec 06 '22 at 22:12
It returns incorrect result for Product ONE white Adidas. Correct result is **Product** but it returns **Product ONE white** – Andrus Dec 06 '22 at 22:40
@Andrus Please see my recent edit. Also, as dc-ddfe pointed out, consider posting your *all* your sample inputs. – Ajax1234 Dec 06 '22 at 22:52

How get name before first uppercase word excluding T-shirt

3 Answers3

Linked