-2

I am trying to manipulate text output from a web scrape I have done. I want a regex function that ONLY for strings that include the notation "-" to remove words from after this in each string.

The function '\s*(.*)(?=-)' works, but removes all text from strings that don't have this notation.

How can I write a function that only affects strings with the "-" notation?

To give context, I am mining real estate agency information, and some entries (not all ) include the specific branch which is delineated using a "-"

However, for the country wide statistical analysis, I don't want the names of the specific branches as this makes it more difficult to format the data effectively.

Attached is a screenshot detailing what the input/desired output should be. enter image description here

  • If you don't have specified that the dot matches any character including a newline, then the pattern `\s*(.*)(?=-)` should not match a line that does not contain a `-` Do you want to remove all after the first occurrence of `-` or after the last occurrence? – The fourth bird Aug 19 '22 at 09:59
  • Provide a few examples of input and expected output – trincot Aug 19 '22 at 10:11
  • I want the "-" to remove all text from the first occurrence of this within each string. – Tristan Matthews Aug 19 '22 at 10:11
  • input would take the form of (ignore the numbers): 1. Knight Frank - Mayfair branch 2. Savills - Knightsbridge branch 3. Sotheby's 4. Hamptons International. Desired output would take the form of: 1. Knight Frank 2. Savills 3. Sotheby's 4. Hamptons International – Tristan Matthews Aug 19 '22 at 10:13
  • @TristanMatthews: Your English is very confusing. Try adding some samples in your post showing expected vs actual. – Pushpesh Kumar Rajwanshi Aug 19 '22 at 10:15
  • input would take the form of (ignore the numbers): 1. Knight Frank - Mayfair branch 2. Savills - Knightsbridge branch 3. Sotheby's 4. Hamptons International. Desired output would take the form of: 1. Knight Frank 2. Savills 3. Sotheby's 4. Hamptons International – Tristan Matthews Aug 19 '22 at 10:19
  • @TristanMatthews Can you update your question with that information in a clear format showing what the expected inputs and outputs are? Are you using the Extract function from this page? https://help.parsehub.com/hc/en-us/articles/217736078-Regular-Expressions-RegEx- – The fourth bird Aug 19 '22 at 10:21
  • Try `^((?=.*-)[^-\n][^-\s]*|.+)`, see [regex demo](https://regex101.com/r/5SWN4H/1). – Wiktor Stribiżew Aug 19 '22 at 13:31

1 Answers1

0

You can use -.*$ regex to find lines that contain a hyphen and optionally some text after that and replace all that with empty string.

Check this demo

Let me know if this works for you.

Pushpesh Kumar Rajwanshi
  • 18,127
  • 2
  • 19
  • 36