Regex function that only affects strings with specific notation

Question

I am trying to manipulate text output from a web scrape I have done. I want a regex function that ONLY for strings that include the notation "-" to remove words from after this in each string.

The function '\s*(.*)(?=-)' works, but removes all text from strings that don't have this notation.

How can I write a function that only affects strings with the "-" notation?

To give context, I am mining real estate agency information, and some entries (not all ) include the specific branch which is delineated using a "-"

However, for the country wide statistical analysis, I don't want the names of the specific branches as this makes it more difficult to format the data effectively.

Attached is a screenshot detailing what the input/desired output should be. enter image description here

If you don't have specified that the dot matches any character including a newline, then the pattern `\s*(.*)(?=-)` should not match a line that does not contain a `-` Do you want to remove all after the first occurrence of `-` or after the last occurrence? — The fourth bird, Aug 19 '22 at 09:59
I want the "-" to remove all text from the first occurrence of this within each string. — Tristan Matthews, Aug 19 '22 at 10:11
input would take the form of (ignore the numbers): 1. Knight Frank - Mayfair branch 2. Savills - Knightsbridge branch 3. Sotheby's 4. Hamptons International. Desired output would take the form of: 1. Knight Frank 2. Savills 3. Sotheby's 4. Hamptons International — Tristan Matthews, Aug 19 '22 at 10:13
@TristanMatthews: Your English is very confusing. Try adding some samples in your post showing expected vs actual. — Pushpesh Kumar Rajwanshi, Aug 19 '22 at 10:15
input would take the form of (ignore the numbers): 1. Knight Frank - Mayfair branch 2. Savills - Knightsbridge branch 3. Sotheby's 4. Hamptons International. Desired output would take the form of: 1. Knight Frank 2. Savills 3. Sotheby's 4. Hamptons International — Tristan Matthews, Aug 19 '22 at 10:19
@TristanMatthews Can you update your question with that information in a clear format showing what the expected inputs and outputs are? Are you using the Extract function from this page? https://help.parsehub.com/hc/en-us/articles/217736078-Regular-Expressions-RegEx- — The fourth bird, Aug 19 '22 at 10:21
Try `^((?=.*-)[^-\n][^-\s]*|.+)`, see [regex demo](https://regex101.com/r/5SWN4H/1). — Wiktor Stribiżew, Aug 19 '22 at 13:31

score 0 · Answer 1 · answered Aug 19 '22 at 11:32

0

You can use -.*$ regex to find lines that contain a hyphen and optionally some text after that and replace all that with empty string.

Check this demo

Let me know if this works for you.

answered Aug 19 '22 at 11:32

Pushpesh Kumar Rajwanshi

18,127
2
19
36

This did not work unfortunately - it simply removed all text from all the outputs – Tristan Matthews Aug 19 '22 at 12:30

Regex function that only affects strings with specific notation

1 Answers1