1

Used UiPath Studio and RegEx to capture some text between two headings in a MS Word document, removed TABS and replaced with "-", now I want to remove any additional "-" characters after the first one.

RegEx used:

(?<=PostalAddress:  )([\S\s]*)(?=Invoice Address:)").Value.Replace(chr(9),“-”) 

with (chr(9)) being ASCII for 'TAB'

Initial string in MS Word

"customer name(TABTAB)customer address(TABTAB)"

Current output

"customer name--customer address--"

Desired output

"customer name - customer address"
Bohemian
  • 412,405
  • 93
  • 575
  • 722
worthy
  • 21
  • 2

2 Answers2

1

I think this is on the right track, but I didn't explain it properly which is why you have two capture groups. The initial input is to capture anything between two points, this includes 4 tabs which are all in the same capture group. Then I want to conver the first TAB to a ',' the other three TABS can be removed.

Text to capture - anything after PostalAddress: and before Invoice Address:

PostalAddress: Business Name TAB TAB AddressLine1, AddressLine2, AddressLine3, AddressLine4, Postcode TAB TAB Invoice Address:

Convert first TAB to ',' and remove remaining TABs

Final text/string should look like;

Business Name, AddressLine1, AddressLine2, AddressLine3, AddressLine4, Postcode

worthy
  • 21
  • 2
0

You want to do 2 things:

  1. change the first 2 tabs to -
  2. remove the 2 tabs at the end of the example string customer address

In that case you should not replace a tab with - but with an empty string, and only for the last part.

What you might do is use a pattern with 2 capture groups for the match, and then in the replacement use the 2 capture groups without the tabs.

(customer name)\t\t(customer address)\t\t

See a regex demo.

In the replacement use $1 - $2

Edit

For the updated question and your own added answer, you can use 2 capture groups.

\bPostalAddress:\s*(.*?)\t\t(.*?)\t.*

In the replacement use group 1, then a comma and space followed by group 2

$1, $2

See a regex demo.

The output after the replacement:

Business Name, AddressLine1, AddressLine2, AddressLine3, AddressLine4, Postcode
The fourth bird
  • 154,723
  • 16
  • 55
  • 70