3

I'm operating on some data that looks like below: dataFrame

the command that I'm performing is :

library(magrittr)

#subsetting the data for MAC-OS & sorting by event-timestamp.
macDF <- eventsDF %>% 
  SparkR::select("device", "event_timestamp") %>%
  SparkR::filter("device = macOS") %>%
  SparkR::arrange("event_timestamp")

display(macDF)

And the error I get is:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'arrange': unable to find an inherited method for function ‘filter’ for signature ‘"character", "missing"’
Some(<code style = 'font-size:10p'> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'arrange': unable to find an inherited method for function ‘filter’ for signature ‘&quot;character&quot;, &quot;missing&quot;’ </code>)

Any help would be appreciated, Thanks!

Riyaz Ali
  • 43
  • 4

1 Answers1

1

I couldn't precisely replicate your error, but I created an example eventsDF dataframe in R, converted it to a Spark dataframe, and updated a bit of your code.

Here's an example in the style you started with. Note the call to SparkR::expr which allows you provide a sql expressions for Spark to put in the where clause it is building. Since this example uses expr() to build a sql where clause, macOS needs to be quoted:

library(magrittr)

eventsDF = data.frame(device=c("macOS","redhat","macOS"),event_timestamp=strptime(c('2022-01-13 12:19','2021-11-14 08:02','2021-12-01 21:33'),format="%Y-%m-%d %H:%M")) %>%
            SparkR::as.DataFrame()

macDF <- eventsDF %>% 
  SparkR::select(eventsDF$device, eventsDF$event_timestamp) %>%
  SparkR::filter(SparkR::expr("device='macOS'")) %>%
  SparkR::arrange('event_timestamp') %>%
  display()

How I might do it:

library(dplyr)
library(SparkR)

eventsDF = data.frame(device=c("macOS","redhat","macOS"),event_timestamp=strptime(c('2022-01-13 12:19','2021-11-14 08:02','2021-12-01 21:33'),format="%Y-%m-%d %H:%M")) %>%
            as.DataFrame()

macDF <- eventsDF %>% 
  select(c('device','event_timestamp')) %>%
  filter(eventsDF$device=='macOS') %>%
  arrange('event_timestamp') %>%
  display()

Results: screenshot of filtered, sorted, eventsDF

  • 1
    Thank you @Nathan!! When I posted this, I was just getting started with Spark. I simply **missed to include macOS in quotes.** Using **`filter(" device=='macOS' ")`** has resolved it for me. Greatly appreciate you responding to this post. Have a good one! :) – Riyaz Ali Jan 16 '22 at 07:36