-1

Problem

SparkR's regexp_replace should follow Java regex rules but I have hard times to identify certain symbols.

Reprex

In this reprex I manage to identify "<", "-" and "/" but not ">" or "+".

# Load packages
library(tidyverse)
library(sparklyr)
library(SparkR)

# Create data
df <- data.frame(test = c("<5", ">5", "3(a)", "a-a", "b+b", "c/c", "d  d", "3..3"))

# Transfer data to Spark memory
df <- copy_to(sc, df, "df", overwrite = TRUE)

# Modify data
df1 <- df %>%
  dplyr::mutate(
    test = regexp_replace(test, "[<]", "_"),
    test = regexp_replace(test, "[>]", "_"),
    test = regexp_replace(test, "[-]", "_"),
    test = regexp_replace(test, "[+]", "_"),
    test = regexp_replace(test, "[/]", "_"))


# Collect and print results
df2 <- df1 %>% as.data.frame()
df2

Solution

# Load packages
library(tidyverse)
library(sparklyr)
library(SparkR)

# Create data
df <- data.frame(test = c("<5", ">5", "3(a)", "a-a", "b+b", "c/c", "d  d", "3..3"))

# Transfer data to Spark memory
df <- copy_to(sc, df, "df", overwrite = TRUE)

# Modify data
df1 <- df %>%
  dplyr::mutate(
    test = regexp_replace(test, "[<>+/-]", "_"))


# Collect and print results
df2 <- df1 %>% as.data.frame()
df2
obruzzi
  • 456
  • 1
  • 4
  • 12

1 Answers1

0

Not sure how sparkr work, but you could be able to do something like this:

df1 <- df %>%
  dplyr::mutate(
    test = regexp_replace(test, "[<>+/-]", "_"),

In the case of the / you might have to do:

    test = regexp_replace(test, "[<>+\\/-]", "_"),
Federico Piazza
  • 30,085
  • 15
  • 87
  • 123
  • Thanks @Federico Piaxxa for the contribution. This worked, thanks a lot! – obruzzi Jun 14 '20 at 15:34
  • You never need to escape `/` in string literal patterns. `/` is not special in any regex flavor. – Wiktor Stribiżew Jun 14 '20 at 15:38
  • @WiktorStribiżew it depends on the parsers as well not only regex engines. Simple example... check this [regex101.com](https://regex101.com/r/jlbUEn/1) – Federico Piazza Jun 14 '20 at 15:47
  • @FedericoPiazza I did not say you should not escape it in regex literals where the regex delimiter char is a slash. I said that "in string literal patterns" you do not have to do this. Simple example at regex101.com - https://regex101.com/r/jlbUEn/2. More examples: [JS](https://regex101.com/r/jlbUEn/3), [Python](https://regex101.com/r/jlbUEn/4), [Go](https://regex101.com/r/jlbUEn/5)... **A slash is not a special regex metacharacter**. – Wiktor Stribiżew Jun 14 '20 at 17:30