0

Is there a code to create a column with only the speed number? In the Cpu column, as included in the image, too much unnecessary information is included for me. I only want the ''Ghz''number (f.i. 2.3, 1.8 and 2.5).

enter image description here

jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • 1
    Please use code, not graphics.. Surely you don't want each and every helper out there to have to do this for themselves?! – John Garland May 17 '22 at 19:56

3 Answers3

1

You can do something like this:

library(stringr)

data %>%
  mutate(speed = as.numeric(str_extract(Cpu, "\\d*[.]?\\d+(?=GHz$)")))
langtang
  • 22,248
  • 1
  • 12
  • 27
1

A slightly easier regex is this:

library(dplyr)
library(stringr)
df %>%
  mutate(CPU_new = str_extract(Cpu, "[0-9.]+(?=GHz)"))

base R:

df$CPU_new <- str_extract(df$Cpu, "[0-9.]+(?=GHz)")

How this works:

  • [0-9.]+: character class allowing digits and the period occurring at least one or more times
  • (?=GHz): positive lookahead asserting that the match to be extracted must be followed by the literal string GHz
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
0

I think the other answer is better, but an alternative approach to using complicated regex is to extract just the 3 positions right before "GHz" using the stringr package:

Data:

df <- data.frame(ScreenResolution = paste("Test",LETTERS[1:3]),
                 Cpu = c("Intel Core i5 2.3GHz","Intel Core i5 1.8GHz",
                         "Intel Core i5 72000U 2.3GHz"),
                 Ram = "8GB")

Code:

library(stringr)
df$Cpu_new <- str_sub(df$Cpu, str_locate(df$Cpu, pattern = "GHz")[1]-4,
                              str_locate(df$Cpu, pattern = "GHz")[1]-1)

Output:

#   ScreenResolution                         Cpu Ram Cpu_new
# 1           Test A        Intel Core i5 2.3GHz 8GB     2.3
# 2           Test B        Intel Core i5 1.8GHz 8GB     1.8
# 3           Test C Intel Core i5 72000U 2.3GHz 8GB     2.3

If you wanted it to be numeric, use as.numeric(str_sub(...))

jpsmith
  • 11,023
  • 5
  • 15
  • 36