1

I have a simple data frame as follows:

Date <- seq(as.Date("2013/1/1"), by = "day", length.out = 12)

test < -data.frame(Date)

test$Value <- c("1,4","2,3","3,6","< 1,4","2,3","3,6","1,4","2,3","3,6","< 1,4","2,3","3,6")

I need to go through each of the rows and remove the "<" sign if detected. Then I need to multiply the remaining number by 5.

I tried gsub() but this only lets me change a character with another character or space but doesn´t let me perform a calculation. I guess I also need to change the decimal separator from "," to "." to be able to use those numbers as numerics.

How can I solve this in R?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Matt
  • 323
  • 2
  • 12
  • Check out `str_replace` from the `stringr` package. You'd then have to convert the column to numeric. – Ben G Jun 26 '18 at 11:14

2 Answers2

3

One approach using sub would be to match the following pattern:

(?:<\s*)?(\d+),(\d+)

(?:<\s*)?   match a < followed by any amount of whitespace, the
            entire quantity either zero or one time
(\d+)       match and capture one or more digits before the comma
,           match the comma separator
(\d+)       match and capture one or more digits after the comma

This seems to match any entry in your Value column. Then, we can replace with a decimal based number using the two capture groups for the whole and fractional component.

Then, we can form a multiplication mask with a 0/1 value, with those entries having < being assigned a 1.

mask <- grepl("<", test$Value)
test$Value <- as.numeric(sub("(?:<\\s*)?(\\d+),(\\d+)", "\\1.\\2", test$Value))
test$Value <- test$Value + (4*mask*test$Value)
test$Value

[1] 1.4 2.3 3.6 7.0 2.3 3.6 1.4 2.3 3.6 7.0 2.3 3.6

Demo

Note: I'm assuming that you want to multiply every number by 5. If not, then let us know and the answer can be slightly changed.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • I need to multiply by 5 only those numbers where there is a "<" sign. – Matt Jun 26 '18 at 11:28
  • @Matt I fixed it. Your original question was not clear about this requirement. – Tim Biegeleisen Jun 26 '18 at 11:28
  • That did the job. However, where is the multiplication by 5 in your example? Just for understanding what you did. – Matt Jun 26 '18 at 11:36
  • If we start out with a single unit of `Value`, to get five times that unit, we only need to add 4 more units. – Tim Biegeleisen Jun 26 '18 at 11:37
  • What if I need to divide those specific values by 2 instead of multiplying by 5? – Matt Jun 26 '18 at 11:44
  • @Tim why not simply `test$Value[mask] <- test$Value[mask] * 5` – Andre Elrico Jun 26 '18 at 11:45
  • You may open a new question in this case :-) – Tim Biegeleisen Jun 26 '18 at 11:45
  • @Matt if you do it like 2 comments above you can simply replace `* 5` with `/ 5` – Andre Elrico Jun 26 '18 at 11:46
  • @TimBiegeleisen Where do I get more information on the expressions you used in test$Value <- as.numeric(sub("(?:<\\s*)?(\\d+),(\\d+)", "\\1.\\2", test$Value))? There is no detailed info when using ?sub() in R. – Matt Jun 26 '18 at 11:53
  • 1
    @Matt I would recommend just reviewing any introductory regex tutorial. I can't fully explain the pattern in a comment. – Tim Biegeleisen Jun 26 '18 at 11:56
  • @Matt the `stringr` cheat sheet is a really good reference for this: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0ahUKEwjwlfyutPHbAhUJm1kKHbAMDK8QFghHMAI&url=http%3A%2F%2Fedrub.in%2FCheatSheets%2FcheatSheetStringr.pdf&usg=AOvVaw07U1-yWXedBfkshFMfY-Yz – Ben G Jun 26 '18 at 13:16
1

Here's a solution using tidyverse

library(tidyverse) #load necessary packages

data <- tibble(value = c("2,3", "< 2,5", "3,5")) %>%
  mutate(value_modified = str_replace(value, ",", "\\."),  # replace the comma with a period
         value_modified = str_extract(value_modified, "[:digit:]\\.[:digit:]"), # extract the relevant characters
         value_modified = as.numeric(value_modified), # convert to numeric
         value_modified = if_else(str_detect(value, "<"), value_modified * 5, value_modified)) # multiply by five if < symbol is in the original data

I find solutions using tidyverse to be easier to follow.

Ben G
  • 4,148
  • 2
  • 22
  • 42