More information can be found in the official documentation.
Questions tagged [spark-shell]
135 questions
0
votes
2 answers
Parsing Data in Apache Spark Scala org.apache.spark.SparkException: Task not serializable error when trying to use textinputformat.record.delimiter
Input file:
___DATE___
2018-11-16T06:3937
Linux hortonworks 3.10.0-514.26.2.el7.x86_64 #1 SMP Fri Jun 30 05:26:04 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
06:39:37 up 100 days, 1:04, 2 users, load average: 9.01, 8.30, 8.48
06:30:01 AM all …

Rohit Nimmala
- 1,459
- 10
- 28
0
votes
1 answer
Getting file not found error because of escape character
I am trying to execute the below spark-shell command in Linux terminal through java code.
echo spark.sparkContext.parallelize\(1 to 3,3\).map\(x =>…

Abinash Dash
- 43
- 1
- 6
0
votes
1 answer
Apache Spark 2.3.1 - twitter is not a member of package org.apache.spark.streaming
First of all I have been looking around for this problem a while now, and I can see there exist other solutions regarding this, however nothing for the Apache Spark version 2.3.1.
To be short, I am trying to create an application that uses bahir to…

Thelin90
- 37
- 2
- 11
0
votes
1 answer
Mahout 0.13.0 spark-shell examples fails with "no jniViennaCL in java.library.path"
I'm Trying to make Mahout 0.13.0 works with spark 1.6.3,
I already have spark 1.6.3 and Hadoop 2.7 working.
I download the last build from the homepage mahout_download.
Unpackage on /opt/mahout.
try to execute the example on spark-shell from the…

Eduardo Liendo
- 131
- 7
0
votes
2 answers
SBT console vs Spark-Shell for interactive development
I'm wondering if there are any important differences between using SBT console and Spark-shell for interactively developing new code for a Spark project (notebooks are not really an option w/ the server firewalls).
Both can import project…

andrew
- 3,929
- 1
- 25
- 38
-1
votes
1 answer
Spark SQL and MongoDB query execution times on the same data don't produce expected results
This is a general question but I am hoping someone can answer it. I am comparing query execution times between MongoDB and Spark SQL. Specifically I have created a MongoDB collection of 1 million entries from a .csv file and ran a few queries on it…

Antonis Pervanas
- 11
- 4
-1
votes
1 answer
How to load data, with array type column, from CSV to spark dataframes
I have CSV file as shown:
name,age,languages,experience
'Alice',31,['C++', 'Java'],2
'Bob',34,['Java', 'Python'],2
'Smith',35,['Ruby', 'Java'],3
'David',36,['C', 'Java', 'R']4
While loading the data, by default all the columns are loading as…

Prakash
- 3
- 3
-1
votes
2 answers
Spark-shell: Web UI doesn't change when I execute process
I use Spark in local mode. I run spark-shell and use a file as a data set. All work very good (for example, I ask spark-shell to count the number of words which begin by "a" in the file and I have the good result), but when I see at the web UI, it…

Fitz
- 41
- 4
-1
votes
1 answer
Array_max spark.sql.function not found
I need to use the function array_max and array_min from the package org.apache.spark.sql.functions._ but both functions are not found?
scala> import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions._
scala>...…

zekri sidi mohamed hicham
- 57
- 1
- 10
-1
votes
1 answer
How to filter out all null values from all the columns of a table in one go using Spark-shell?
I am using Spark shell 1.6. I want to perform a check to separate all the rows containing null values from the once that don't. More precisely I have to segregate them into 2 different tables (data and error). Problem is that I have too many columns…

Harsh Somani
- 59
- 6
-1
votes
1 answer
Is there a reason why .scala file won't run/produce output on spark-shell?
I am trying to run an application that prints "Hello World!". The script works fine locally, but every time I run it with
:load /path/to/script
output:
Loading /u/hdpdlcu/Matt/test/SparkScalaCourse/src/com/sundogsoftware/spark/test1.scala...
…

Matt
- 113
- 1
- 1
- 5
-1
votes
1 answer
Row vs List in spark-shell
What is the difference between Spark Row and Scala List,
both provide a way to access items by Index
When to use which one
The only difference I can see in Row is that it has some schema.
scala> val a=Row(1,"hi",2,"hello")
a:…

santhosh sandy
- 11
- 3
-1
votes
1 answer
The system can't find the path specified spark-shell on windows 10
I am trying to install spark on my local. It is giving below error when running spark-shell
The system can't find the path specified
I have updated all environmental variables like JAVA_HOME, SPARK_HOME, PATH variables but still getting the error.

Swetha
- 11
- 3
-2
votes
2 answers
Merging n rows of a dataframe containing duplicate values
I have a dataframe like below
Id linkedIn
1 [l1,l2]
2 [l5,l6,l3]
3 [l4,l5]
4 [l8,l10]
5 [l7,l9,l1]
If we see row 1 & 5 have l1 in common so those two should be merged as one row with Id=1. Similarly row 2 & 3 have l5 in…

Haridesh Yadav
- 93
- 7
-4
votes
2 answers
Scala, Spark-shell, Groupby failing
I have Spark version 2.4.0 and scala version 2.11.12. I can sucessfully load a dataframe with the following code.
val df =…

user204548
- 25
- 1
- 5