I have some data in a database, and I want to work with it in Spark, using sparklyr.
I can use a DBI-based package to import the data from the database into R
dbconn <- dbConnect()
data_in_r <- dbReadTable(dbconn, "a table")…
I have tried several tutorials on setting up Spark and Hadoop in a Windows environment, especially alongside R. This one resulted in this error by the time I hit figure 9:
This tutorial from Rstudio is giving me issues as well. When I get to…
I would like to skip (dropping out) the first two lines of a text file:
to the best of my knowledge this is not possible with sparklyr method spark_read_csv. There is some workaround to solve this simple problem?
I know the existance of sparklyr…
I'm trying to use sparklyr to read a csv file into R. I can read the .csv into R just fine using read.csv(), but when I try to use spark_read_csv() it breaks down.
accidents <- spark_read_csv(sc, name = 'accidents', path =…
There exist a Databricks’s built-in display() function (see documentation here) which allow users to display R or SparkR dataframe in a clean and human readable manner where user can scroll to see all the columns and perform sorting on the columns.…
I'm attempting to manipulate a Spark RDD via sparklyr with a dplyr mutate command to construct a large number of variables, and each time this seems to fail with an error message regarding Java memory exceeding 64 bits.
The mutate command is coded…
I have recently started working on Databricks and I have been trying to find a way to perform a merge statement on a Delta table, though using an R api (preferably sparklyr). The ultimate purpose is to somehow impose a 'duplicate' constraint as…
I'm having difficulty connecting to and retrieving data from a kafka instance. Using python's kafka-python module, I can connect (using the same connection parameters), see the topic, and retrieve data, so the network is viable, there is no…
How can I select all columns after a designated column using R (ideally dplyr only but non-dplyr solutions welcome). For example, say in the dataframe mtcars, I want to grab all columns after the vs that would be am gear carb. But I want a function…
I am having the following issue while connecting to sparkyr.
sc <- spark_connect(master = "local")
* Using Spark: 2.4.3
Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId, :
Gateway in localhost:8880 did not respond.
Try…
I have another question in the word2vec universe.
I am using the 'sparklyr'-package. Within this package I call the ft_word2vec() function. I have some trouble understanding the output:
For each number of sentences/paragraphs I am providing to the…
My attempts with top_n() and scale_head() both failed with errors.
An issue with top_n() was reported in https://github.com/tidyverse/dplyr/issues/4467 and closed by Hadley with the comment:
This will be resolved by #4687 + tidyverse/dbplyr#394…
I am trying to replicate the tidyr:complete function in sparklyr. I have a dataframe with some missing values and I have to fill out those rows. In dplyr/tidyr I can do:
data <- tibble(
"id" = c(1,1,2,2),
"dates" = c("2020-01-01", "2020-01-03",…
When trying to connect to spark using sparklyr, I get the following error message:
'Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId, :
Gateway in localhost:8880 did not respond.'
There is no other info displayed in…