More information can be found in the official documentation.
Questions tagged [spark-shell]
135 questions
0
votes
2 answers
Spark shell not starting with spark-cassandra-connector 3.1.0
I've been trying to start spark-shell with Cassandra connector.
When I tried running it with spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.12:3.1.0 or when I even compiled connector from…

Martin Macak
- 3,507
- 2
- 30
- 54
0
votes
1 answer
What is the difference between spark-shell and pyspark, in terms of the language we use to write codes?
I wrote
a = sc.parallelize([1,2,3])
in spark-shell and got error
error: illegal start of simple expression
a = sc.parallelize([1,2,3])
^
but when I wrote this in PySpark, it worked.
What's the difference between the…

Abhay
- 21
- 2
0
votes
1 answer
In spark-shell, Is it a code problem or a memory problem?
I am studying about bioinformatics and Recently, I used annovar tools(table_annovar.pl)
I got data(format csv) that Included chr/Func.refGene/Gene.refGene/ExonicFunc.refGene, etc...
So I tried data handling for counting chr per Variant of…

DongYoon
- 1
- 2
0
votes
0 answers
can't connect spark and pycharm. please help me
C:\Users\nowee>spark-shell
Python지정된 경로를 찾을 수 없습니다.
C:\Users\nowee>pyspark
Python지정된 경로를 찾을 수 없습니다.
지정된 경로를 찾을 수 없습니다.
When I first did pyspark and spark-shell with cmd, it worked, but suddenly this error popped up.
The environment settings are as…

곽수찬
- 1
0
votes
1 answer
Spark Shell - system cannot find the path specified in Windows 10
I am trying to run Spark on Windows 10. I have placed spark files and winutils in the folder. I have specified the path in User and System Variable as well. But when running the spark-shell command it gives me an error.
Error Message - The system…

Mihir Garg
- 29
- 3
0
votes
1 answer
How to split a JSON array to multiple JSONs using scala spark
I have an array JSON as below format
{
"marks": [
{
"subject": "Maths",
"mark": "80"
},
{
"subject": "Physics",
"mark": "70"
},
{
"subject": "Chemistry",
"mark": "60"
}
]
}
I need to…

Aldrin Rodrigues
- 151
- 1
- 11
0
votes
1 answer
Decimals stored in scientific format in Hive table while loading it from Apache Spark
I am facing a problem with a hive table where decimal number such as 0.00000000000 is stored as 0E-11. Even though they are representing the same value 0, I do not understand why it is getting stored in scientific format. This is one of the…

Venkatesan Muniappan
- 445
- 3
- 21
0
votes
1 answer
Reading data from S3 with partitions of unequal columns
I have some partitioned data in S3 and each partition is having different number of columns like below. When I read the data in pyspark and tru to print schema I can only read columns which are commonly present in all partitions but not all. What is…

bunnylorr
- 201
- 1
- 10
0
votes
1 answer
How to format CSV data by removing quotes and double-quotes around fields
I'm using a dataset and apparently it has "double quotes" wrapped around each row. I can't see it as it opens with Excel by default when I use my browser.
The dataset looks like this…

Joyce
- 1
0
votes
0 answers
Spark config, org.apache.spark.shuffle.FetchFailedException Failed to connect
I installed hadoop 3.1.0 and spark 2.4.7 on 4 virtual machines. In total I have 32 cores, 128G memory. I have been running spark-shell test
[hadoop@hadoop1 bin]$hadoop fs -mkdir -p /user/hadoop/testdata
[hadoop@hadoop1 bin]$hadoop fs -put…

davidzxc574
- 471
- 1
- 8
- 21
0
votes
0 answers
Spark Shell command failing on local
I am trying to run spark-shell command locally and I am getting below error
java.net.BindException: Can't assign requested address: Service
'sparkDriver' failed after 16 retries (on a random free port)!
Consider explicitly setting the appropriate…
0
votes
1 answer
Apache Hudi example from spark-shell throws error for Spark 2.3.0
I am trying to run this example (https://hudi.apache.org/docs/quick-start-guide.html) using spark-shell. The Apache Hudi documentation says "Hudi works with Spark-2.x versions"
The environment details are:
Platform: HDP 2.6.5.0-292
Spark version:…

Joyan
- 41
- 1
- 7
0
votes
1 answer
Failed to send RPC XXXX in spark-shell Hadoop 3.2.1 and spark 3.0.0
I am trying to run spark shell in psuedodistributed mode on my windows 10 pc having 8 Gigs of ram.
I am able to submit and run a mapreduce wordcount on yarn ,but when i try to initialize a spark shell or spark submit any program with master as yarn…

Aldrin Machado
- 97
- 1
- 10
0
votes
1 answer
Not able to create dataframe out of multi line json string or JSONL string using spark
I have been trying to form data frame out of jsonl string. I'm able to form data frame but the problem is only single row is being read, ignoring other.
Here are things I tries in spark-shell
// This one is example multiline json.
val jsonEx =…

Sachin Doiphode
- 431
- 2
- 10
- 24
0
votes
1 answer
How to get exit status of spark-shell<EOF in bash script?
I have part of shell script as below..
spark_data=spark-shell << EOF spark.sql(query) EOF
i need the exit status of the spark.sql query..
Can someone help on this..
Awaiting your reply
Thanks