Questions tagged [sparkr]

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.

SparkR is a package that provides a light-weight frontend to use from R.

SparkR exposes the Spark API through the RDD class and allows users to interactively run jobs from the R shell on a cluster.

SparkR exposes the RDD API of Spark as distributed lists in R.

Related Packages:

References:

796 questions
-1
votes
1 answer

Regex issue in regexp_replace

Problem SparkR's regexp_replace should follow Java regex rules but I have hard times to identify certain symbols. Reprex In this reprex I manage to identify "<", "-" and "/" but not ">" or "+". # Load…
obruzzi
  • 456
  • 1
  • 4
  • 12
-1
votes
1 answer

How to replace value by NAN in spark data frame (problem is parallization)

Task : Let df be a spark data frame. We want to replace a value n in df by NA. In R I would simply write df[df==n] <- NA Problems / questions : (as I am new to Spark any comment is welcome) What is the equivalent in SparkR to NA? I found…
Christian
  • 15
  • 1
  • 4
-1
votes
2 answers

How to convert character into Date in R?

I have an excel file where dates are in below format. 01-Jan-2020 03-Jun-2015 I need to convert this in Date. I have tried with many converting techniques.I am getting NA every time.
-1
votes
1 answer

foreach function in sparkr

I am relatively new on SparkR, and I am planning to transfer a for loop into a foreach loop in SparkR (R/3.3.3 & Spark/2.2.0). I have searched on stackoverflow, the only relevant thread is: SparkR foreach loop But it gives only the workaround by…
windsound
  • 706
  • 4
  • 9
  • 31
-1
votes
1 answer

Utilize serveral multicore linux servers for computation in R

I have four 32 cores linux servers (CentOS 7) that I would like to utilize for a parallelized computation in R So far I have been only using doMC packages and registerDoMC(cores=32) to utilize the multicore capabilities of a single server. I would…
lui.lui
  • 81
  • 1
  • 7
-1
votes
2 answers

Can not launch sparkR

I have installed Spark 2.0 and try the sparkR command. But command occurs error message like below. Others are OK (spark-shell, pyspark,,,). Please help... [Error message] Dongkils-MacBook:spark-2.0.0-bin-hadoop2.7 dongkillee$ sparkR env: R: No such…
-1
votes
2 answers

Include postgres JDBC Drive into SparkR

I use this preliminaries to be able to connect to a PostgreSQL Database. They won't work but I can't find any suggestions for the correct notation. .libPaths(c(.libPaths(), '/opt/spark-1.6.1-bin-hadoop2.6/R/lib')) Sys.setenv(SPARK_HOME =…
-1
votes
1 answer

Is it feasible to run logistic regression on a Latop with 4GB RAM

I am trying to perform a logistic regression in R on my data. I have created all the model variables and have them in place in a table on my Redshift database. Lets refer to this database as 'Database A' and the table as 'Table A' Problem…
Ajay Kumar
  • 71
  • 1
  • 3
  • 11
-1
votes
1 answer

How to write csv file in apache spark using SparkR?

I am able to load data Successfully using following commands sc = sparkR.init(master = 'local', sparkPackages = 'com.databricks:spark-csv_2.11:1.4.0') sqlContext <- sparkRSQL.init(sc) ss <- read.df(sqlContext,…
-1
votes
1 answer

SparkR data frame add column with constant value

I want to do this in a sparkR data frame: I'm adding a column with the String "a" to the data frame df$new_col <- "a" I can't find a way to do this in sparkR
Erick Díaz
  • 98
  • 2
  • 15
-1
votes
1 answer

Save functionvalues in a file in SparkR

I have some calculated values and I want to save them in SparkR. If I save it as a csv-file write.csv(data, file="/.../data.csv", row.names=FALSE) it takes very long time for some reason. Is there a better way to do this ?
Ole Petersen
  • 670
  • 9
  • 21
-2
votes
1 answer

Running parallel function calls with sparklyr

Currently, I am using foreach loop from doparallel library to run function calls in parallel across multiple cores of the same machine, which looks something like this: out_results=foreach(i =1:length(some_list))%dopar% { …
-2
votes
1 answer

sparkR do not support RDD related APIs after version1.6.1?

In the https://issues.apache.org/jira/browse/SPARK-23213 See the developer's comments: " o clarify we don’t support RDD in R. Anything you access via SparkR::: is not supported, that include unionRDD, is not supported. Check the spark doc , not…
Tony
  • 11
  • 4
-2
votes
1 answer

how can I use function map for data frame in sparkR?

I have a data frame as shown below ozone particullate_matter carbon_monoxide sulfure_dioxide nitrogen_dioxide 1 101 94 49 44 87 2 106 97 48 47 …
-2
votes
2 answers

R - data munging and scalable code

Hy, in the last days I had a small/big problem. I have a transaction dataset, with 1 million rows and two columns (Client Id and product id) and I want transform this in a binary matrix. I used reshape and spread function, but in both cases I used…
Kardu
  • 865
  • 3
  • 13
  • 24
1 2 3
53
54