-2

I have a string like :

"Father’s Name : ABC NaskarDate of Birth : 18-01-1979Permanent Address: This is the address field for the personContact Numbers : 98413***28Passport Number:PAN Number: AEFXXXXXXXLanguages Known: Tamil, English"

My desired output is :

"|||Father’s Name : ABC Naskar|||Date of Birth : 18-01-1979|||Permanent Address: This is the address field for the person|||Contact Numbers : 98413***28|||Passport Number:|||PAN Number: AEFXXXXXXX|||Languages Known: Tamil, English"

That means I want to add "|||" before some specific strings like Father’s Name, Date of Birth etc.Thanks

A.Naskar
  • 49
  • 7

2 Answers2

1

We are not able to find a general pattern, but based on the string showed, it seems that the ||| separator should be at the start (^) of the string, whereever there is a lower case letter followed by upper case or number followed by upper case, also before PAN and between XXXX and Languages. In that case, a regex lookaround should work.

gsub("(?<=[a-z0-9])(?=[A-Z])|^|(?<=[XXX])(?=Lang)|(?=PAN)", "|||", str1, perl = TRUE)
#[1] "|||Father’s Name : ABC Naskar|||Date of Birth : 18-01-1979|||Permanent Address: This is the address field for the person|||Contact Numbers : 98413***28|||Passport Number:|||PAN Number: AEFXXXXXXX|||Languages Known: Tamil, English"

data

str1 <- "Father’s Name : ABC NaskarDate of Birth : 18-01-1979Permanent Address: This is the address field for the personContact Numbers : 98413***28Passport Number:PAN Number: AEFXXXXXXXLanguages Known: Tamil, English"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    But before Language I am getting "B" also if PAN number is end with "B" Like : "PAN Number: AEF***35|||BLanguages Known: Tamil, English" Here PAN no. was AEF***35B. Thanks.. – A.Naskar Aug 23 '16 at 07:24
  • @A.Naskar As I stated in the solution, I was not sure about the patterns In that case you need one more pattern there `(?=BLang)` – akrun Aug 23 '16 at 07:31
1
gsub("(Father’s Name)|(Date of Birth)", "|||\\1\\2", x)
[1] "|||Father’s Name : ABC Naskar|||Date of Birth : 18-01-1979Permanent Address: This is the address field for the personContact Numbers : 98413***28Passport Number:PAN Number: AEFXXXXXXXLanguages Known: Tamil, English"

Fortunately the regex OR, "|", is able to get distributed in the substitution with the second pattern.

IRTFM
  • 258,963
  • 21
  • 364
  • 487