I'm trying to connect to IBM's Spark as a Service running on Bluemix from RStudio running on my desktop machine.
I have copied the config.yml from the automatically configured RStudio environment running on IBM's Data Science Experience:
default:
…
I'm having a Spark dataframe tbl_pred with the folowing factor column:
**Value**
13,3
11
5,3
I like to convert those 'strings' to numeric values. I can use the as.numeric function, but this doesn't work because my seperator is a comma.
tbl_pred…
I am working with sparklyr and am having trouble changing column classes along with using dplyr to aggregate the data. This is my code currently:
.libPaths(c(.libPaths(), '/usr/lib/spark/R/lib'))
Sys.setenv(SPARK_HOME =…
I need to fit GLMs on data that doesn't fit into my computer's memory. Usually to get around this issue, I would sample data, fit the model and then test on a different sample that would sit out of memory. This has been R's major limitation for me…
I am trying to connect to spark using sparklyr package in R and I am getting the following error:
library(sparklyr)
> library(dplyr)
> config <- spark_config()
> config[["sparklyr.shell.conf"]] <-…
I'd like to be able to use the Java methods on a SparkR SparkDataFrame to write data to Cassandra.
Using the sparklyr extensions for example, I can do something like this:
sparklyr::invoke(sparklyr::spark_dataframe(spark_tbl), "write") %>>%…
I have the following columns in my dataframe:
c1_sum | c2_sum | d | c1 | c2
The columns c# and c#_sum are dynamic. I'm trying to do something like this for all c#:
mutate(c#_weight = (d * c#) / c#_sum)
The final result would be:
c1_sum | c2_sum |…
Looking to run a GraphX example on my Windows machine using Spark-Shell from SparklyR install of Hadoop/Spark. Am able to launch the shell from the install directory here first:
start…
I'm trying to load an SQL table in R through sparkR. I have the following code:
Sys.setenv(SPARK_HOME = "C:/Users/hms/Desktop/spark-2.0.1-bin-hadoop2.7/spark-2.0.1-bin-hadoop2.7",
HADOOP_HOME =…
I am using sparklyr and it seems to be working well. However, some of my former code will not be implemented.
When is use
complete.cases
I get
Error: org.apache.spark.sql.AnalysisException: undefined function
COMPLETE.CASES
I get the same…
I tried following command in my local RStudio session to connect to sparkR -
sc <- spark_connect(master = "spark://x.x.x.x:7077",
spark_home = "/home/hduser/spark-2.0.0-bin-hadoop2.7", version="2.0.0", config = list())
But, I am getting following…
I'm trying to connect R to Spark following the sparklyr tutorial from RStudio: http://spark.rstudio.com/
But some how, I'm getting a weird error message as below. Does anyone knows how to solve this ?
I have tried to add the C:\Windows\system32 path…
Hello I am just getting started using Sparklyr and I am getting an error when trying to use dplyr to wrangle some data.
library(sparklyr)
sc <- spark_connect(master = "local")
spark_read_csv(sc, "df2_tbl",
"C:/Users/...csv")
spark_read_csv(sc,…
Currently, I am using foreach loop from doparallel library to run function calls in parallel across multiple cores of the same machine, which looks something like this:
out_results=foreach(i =1:length(some_list))%dopar%
{
…