2

Working on a Spark platform, using R and RStudio Server, I want to filter my tbl where a given column (string) meets the condition of being numeric. Hence, the column contains both numeric/integer values ('1234') and string values ('x1234') and I want to filter out the latter.

Is this possible using dyplr, for example with the filter operator/function?

I tried stuff like:

tbl2 <- tbl %>% filter(!is.numeric(col1))

By the way: the data is stored on other servers, I don't want to load the data locally. Hence I'm not able to use objects like dataframes. I'm bounded to libraries like dyplr.

Help is highly appreciated!

Dendrobates
  • 3,294
  • 6
  • 37
  • 56

3 Answers3

9

I donot think this works: tbl2 <- tbl %>% filter(!is.numeric(col1)). In a tbl_df or df, each column has only one class. So in your case, the col1 containing "123" and "x123" should be of the class "character".

One possible solution is to convert the col1 as numeric and to test if the conversion succeeds.

tbl2 <- tbl %>% filter(!is.na(as.numeric(col1)))

Test:

library(data.frame)
df <- data.frame(a=1:3,b=2:4)
df[2,2] <- "x123"
class(df$b)
# "character"
df %>% filter(!is.na(as.numeric(b)))
#  a b
#1 1 2
#2 3 4

NOTE: there is a Warning message.

Huanfa Chen
  • 577
  • 6
  • 15
2

Two approaches are possible: your can use regular expressions to identify strings that could be converted to numbers, e.g., grepl("^[0-9]$", c("1", "1.x", "x.1", "5.5"), perl = T) (see Regex for numbers only).

Another way is just to force conversion of strings into numbers as in as.numeric and filter out the NAs, e.g., filter(!is.na(as.numeric(c("1", "1.x", "x.1", "5.5"))))

edit ups, just a minute too late ;-)

Community
  • 1
  • 1
Drey
  • 3,314
  • 2
  • 21
  • 26
0

This will get you just the numeric rows:

tbl2 <- tbl %>% 
filter(!str_detect(col1,"^\\s*[0-9]*\\s*$"))

This will get you just non numeric rows:

tbl2 <- tbl %>% 
filter(str_detect(col1,"^\\s*[0-9]*\\s*$"))
ah bon
  • 9,293
  • 12
  • 65
  • 148
natalie
  • 204
  • 1
  • 4