Add a white-space between number and special character condition R

Question

I'm trying to use stringr or R base calls to conditionally add a white-space for instances in a large vector where there is a numeric value then a special character - in this case a $ sign without a space. str_pad doesn't appear to allow for a reference vectors.

For example, for:

$6.88$7.34

I'd like to add a whitespace after the last number and before the next dollar sign:

$6.88 $7.34

Thanks!

Try `sub("([$])", " \\1", val)` – akrun Jan 11 '19 at 16:36 — akrun, Jan 11 '19 at 16:36

akrun · Answer 1 · 2019-01-11T16:42:28.327

2

If there is only one instance, then use sub to capture digit and the $ separately and in the replacement add the space between the backreferences of the captured group

sub("([0-9])([$])", "\\1 \\2", v1)
#[1] "$6.88 $7.34"

Or with a regex lookaround

gsub("(?<=[0-9])(?=[$])", " ", v1, perl = TRUE)

data

v1 <- "$6.88$7.34"

edited Jan 11 '19 at 16:42

answered Jan 11 '19 at 16:37

akrun

874,273
37
540
662

1

Thanks! That's super helpful! – js80 Jan 11 '19 at 16:47
Hi, I'm running into an edge case where I need to look to the left of a series of string following a "%" and adding a space. Here is an example: $6.57-10.59% this is can add a space for: gsub("(?=[-])", " ", test9, perl = TRUE) BUT if the % value is positive, and sometimes it can be single digits or double digits excluding decimal - the string won't add a proper white space. Any idea how to accomplish this? – js80 Jan 12 '19 at 21:33
Here's an example of the situation $6.5910.57% or $6.599.40%. Unfortunately I can't always count to the right of the $ or the left of the % to add the space. Maybe I could try adding a space two right of a decimal? – js80 Jan 12 '19 at 21:34
@js80 Is it always two digits to the right. If it is single digits too, then it creates an issue in understanding where the new one begins – akrun Jan 13 '19 at 09:23

Chabo · Accepted Answer · 2019-01-11T18:22:28.167

1

This will work if you are working with a vectored string:

mystring<-as.vector('$6.88$7.34 $8.34$4.31')

gsub("(?<=\\d)\\$", " $", mystring, perl=T)

[1] "$6.88 $7.34 $8.34 $4.31"

This includes cases where there is already space as well.

Regarding the question asked in the comments:

mystring2<-as.vector('Regular_Distribution_Type† Income Only" "Distribution_Rate 5.34%" "Distribution_Amount $0.0295" "Distribution_Frequency Monthly')

gsub("(?<=[[:alpha:]])\\s(?=[[:alpha:]]+)", "_", mystring2, perl=T)

[1] "Regular_Distribution_Type<U+2020> Income_Only\" \"Distribution_Rate 5.34%\" \"Distribution_Amount $0.0295\" \"Distribution_Frequency_Monthly"

Note that the \ appears due to nested quotes in the vector, should not make a difference. Also <U+2020> appears due to encoding the special character.

Explanation of regex:

(?<=[[:alpha:]]) This first part is a positive look-behind created by ?<=, this basically looks behind anything we are trying to match to make sure what we define in the look behind is there. In this case we are looking for [[:alpha:]] which matches a alphabetic character.

We then check for a blank space with \s, in R we have to use a double escape so \\s, this is what we are trying to match.

Finally we use (?=[[:alpha:]]+), which is a positive look-ahead defined by ?= that checks to make sure our match is followed by another letter as explained above.

The logic is to find a blank space between letters, and match the space, which then is replaced by gsub, with a _

See all the regex here

edited Jan 11 '19 at 18:22

answered Jan 11 '19 at 16:36

Chabo

2,842
3
17
32

I've tried that but the issue is to cure only instances where there's no space between a numerical character and a '$'. The vector is long and there are instances where there isn't a formatting issue with a '$' and then numerical price so I think the solution needs differentiate for the instance I mentioned. Thanks for your help! – js80 Jan 11 '19 at 16:40
A related question for cleaning and reformatting this vector. How can I concatenate elements 2 and 3 only with a by joining them with a "_" and not impacting the structure of rest of the vector? I.e. "Income" and "Only" are treated as two separate elements and should be joined while keeping the rest the same. Thanks!!! – js80 Jan 11 '19 at 17:17
The vector will always have two words just not necessarily those two so I can't just do a simple str_replace_all. – js80 Jan 11 '19 at 17:18
@js80 would you mind explaining where words fit into the vector, as of now the only example I have is `vector<'-$6.88$7.34?`' – Chabo Jan 11 '19 at 17:22
I'll show you: "Regular_Distribution_Type† Income Only" "Distribution_Rate 5.34%" "Distribution_Amount $0.0295" "Distribution_Frequency Monthly" At the very end of my script I unlist the elements but that 2nd and 3rd element with the space should be one not two regardless of the words used in each field. Thanks! – js80 Jan 11 '19 at 17:25
This is a separate vector that I piece together with the original. – js80 Jan 11 '19 at 17:34
1

You got it! Thanks! – js80 Jan 11 '19 at 17:36
Sorry for the basic question but what does the syntax: (?<=[[:alpha:]])\\s(?=[[:alpha:]]+) mean for a novice? I know the \\s but that's about it. Appreciate your help! – js80 Jan 11 '19 at 17:39

Add a white-space between number and special character condition R

2 Answers2

data