Questions tagged [sparklyr]

sparklyr is an alternative R interface for Apache Spark

sparklyr provides an alternative to interface for built on top of .

External links:

784 questions
0
votes
0 answers

Remove words with a length of maximum 2. Spark

I want to remove (or replace with a non-blank value) all words of a length less than 2 in sparklyr. My attempt is below, but doesn't work: Tab8b <- tab8 %>% Ft_sql_transformer( sql="select * , Regexp_replace(VAR,…
Camel
  • 11
  • 5
0
votes
1 answer

creating a pie chart using genderizer package in Sparklyr, R

Hi I am trying to create a pie chart in R using genderizer package. I am referring below code from site https://www.r-bloggers.com/the-gender-of-big-data/: library(rvest) library(stringr) library(dplyr) library(genderizeR) library(ggplot2) …
RJ_Programmer
  • 31
  • 1
  • 6
0
votes
2 answers

SparklyR removing a tbl from Spark Context

Similar to: SparklyR removing a Table from Spark Context, but different because: The above question asks how to remove a "table" from spark, here created by the copy_to function. If the spark_read_csv() function is used instead it appears that there…
DaveRGP
  • 1,430
  • 15
  • 34
0
votes
0 answers

Connecting to Spark from R using username password

We have a requirement where in we plan to use sparklyr to execute model code written in R over spark. The spark cluster we use is a kerborised cluster. We are able to connect to this cluster and execute our code using a keytab. The challenge we…
0
votes
0 answers

Looking for a way to: R Studio accessing files on windows AWS server

I have installed R studio on my local laptop and trying to access files located in AWS server (Windows). I do not want to use FTP protocol. What are other possible ways to remotely access the files located on a remote server? How to use SCP/SSH…
SC_kumar
  • 21
  • 5
0
votes
1 answer

dplyr to replace all variable which matches specific string

Is there an equivalent dplyr which does this? I'm after 'replace all' which matches string xxx with NA is.na(df) <- df=="xxx" I want to execute a sparklyr command using the pipe function from R to Spark dataframe tbl(sc,"df") %>% and sticking the…
Choc_waffles
  • 518
  • 1
  • 4
  • 15
0
votes
0 answers

Rsparkling memory issue

I'm running out of memory when I try to fit a random forest model on my dataset (5888 bytes) using the rsparkling random forest function with the following: h2o.randomForest(x = x, y = y, training_frame =…
mike
  • 35
  • 6
0
votes
0 answers

sparklyr help: spark_read_csv returns an error

I have a 3GB csv file called accelerometer.csv on my cpu. I wanted to read it into Spark using R and the sparklyr package just as an experiment before importing seriously big data (180 GB). I used this code here: spark_c <- spark_connect(master =…
user7426583
0
votes
0 answers

Error in connecting with Spark using spark_connect command in 'sparklyr': (R-3.4.0)

I have Spark 1.6.2 installed on my system. Also I am using R(3.4.0) with rstudio-server 1.0.143 in CentOS 6.9 machine. Whenever I am running the command, sc <- spark_connect(master = "local") it shows an error message stating that: Error in…
0
votes
0 answers

Sparklyr: how to improve reading speed for JSON files?

I am (trying) to load about 40 large json files (150 - 200GB each on average) into Spark using sparklyr. Some of the files would fit entirely in the RAM of a cluster, some of them would be too big. Unfortunately, the command…
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
0
votes
0 answers

Just learning sparklyr - copy_to() error

I know this is a very simple question, and I assume it has been asked before but I have been unable to find it. I would like to learn sparklyr. However, I wrote devtools::install_github("rstudio/sparklyr") install.packages(c("nycflights13",…
madhatter5
  • 129
  • 2
  • 15
0
votes
1 answer

How to export sparklyr (Spark ML) models to PMML?

I know that Spark ML pipelines can be exported to PMML using the JPMML-SparkML library. I am just struggling to find out how I could do it from R using sparklyr. I am aware of open github issue, where two ideas were raised: using Scala API,…
michalrudko
  • 1,432
  • 2
  • 16
  • 30
0
votes
1 answer

Hive: how to convert millisecond timestamps?

I am trying to use the HIVE UDFs (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions) from Sparklyr to read-in properly some timestamps. Unfortunately, I have not been able to parse correctly the…
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
0
votes
3 answers

is it possible to connect to mongodb from SparklyR

i can connect to MongoDB from SparkR (i am using R Studio, Spark 2.x.x, Mongo connector v2.0) as described here https://docs.mongodb.com/spark-connector/current/r-api/. I would like to do the same using SparklyR, is that possible? Could not find any…
Amit Arora
  • 169
  • 3
  • 15
0
votes
0 answers

Shiny and Spark: where to run spark_connect?

Following my how to free Spark resources? post, does it matter where you place the (sparklyr) spark_connect in the server.R : within or outside the shinyServer(function(input, output, session) ?
guzu92
  • 737
  • 1
  • 12
  • 28