Questions tagged [spark-shell]

More information can be found in the official documentation.

135 questions
0
votes
2 answers

Parsing Data in Apache Spark Scala org.apache.spark.SparkException: Task not serializable error when trying to use textinputformat.record.delimiter

Input file: ___DATE___ 2018-11-16T06:3937 Linux hortonworks 3.10.0-514.26.2.el7.x86_64 #1 SMP Fri Jun 30 05:26:04 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux 06:39:37 up 100 days, 1:04, 2 users, load average: 9.01, 8.30, 8.48 06:30:01 AM all …
Rohit Nimmala
  • 1,459
  • 10
  • 28
0
votes
1 answer

Getting file not found error because of escape character

I am trying to execute the below spark-shell command in Linux terminal through java code. echo spark.sparkContext.parallelize\(1 to 3,3\).map\(x =>…
Abinash Dash
  • 43
  • 1
  • 6
0
votes
1 answer

Apache Spark 2.3.1 - twitter is not a member of package org.apache.spark.streaming

First of all I have been looking around for this problem a while now, and I can see there exist other solutions regarding this, however nothing for the Apache Spark version 2.3.1. To be short, I am trying to create an application that uses bahir to…
Thelin90
  • 37
  • 2
  • 11
0
votes
1 answer

Mahout 0.13.0 spark-shell examples fails with "no jniViennaCL in java.library.path"

I'm Trying to make Mahout 0.13.0 works with spark 1.6.3, I already have spark 1.6.3 and Hadoop 2.7 working. I download the last build from the homepage mahout_download. Unpackage on /opt/mahout. try to execute the example on spark-shell from the…
0
votes
2 answers

SBT console vs Spark-Shell for interactive development

I'm wondering if there are any important differences between using SBT console and Spark-shell for interactively developing new code for a Spark project (notebooks are not really an option w/ the server firewalls). Both can import project…
andrew
  • 3,929
  • 1
  • 25
  • 38
-1
votes
1 answer

Spark SQL and MongoDB query execution times on the same data don't produce expected results

This is a general question but I am hoping someone can answer it. I am comparing query execution times between MongoDB and Spark SQL. Specifically I have created a MongoDB collection of 1 million entries from a .csv file and ran a few queries on it…
-1
votes
1 answer

How to load data, with array type column, from CSV to spark dataframes

I have CSV file as shown: name,age,languages,experience 'Alice',31,['C++', 'Java'],2 'Bob',34,['Java', 'Python'],2 'Smith',35,['Ruby', 'Java'],3 'David',36,['C', 'Java', 'R']4 While loading the data, by default all the columns are loading as…
-1
votes
2 answers

Spark-shell: Web UI doesn't change when I execute process

I use Spark in local mode. I run spark-shell and use a file as a data set. All work very good (for example, I ask spark-shell to count the number of words which begin by "a" in the file and I have the good result), but when I see at the web UI, it…
Fitz
  • 41
  • 4
-1
votes
1 answer

Array_max spark.sql.function not found

I need to use the function array_max and array_min from the package org.apache.spark.sql.functions._ but both functions are not found? scala> import org.apache.spark.sql.functions._ import org.apache.spark.sql.functions._ scala>...…
-1
votes
1 answer

How to filter out all null values from all the columns of a table in one go using Spark-shell?

I am using Spark shell 1.6. I want to perform a check to separate all the rows containing null values from the once that don't. More precisely I have to segregate them into 2 different tables (data and error). Problem is that I have too many columns…
-1
votes
1 answer

Is there a reason why .scala file won't run/produce output on spark-shell?

I am trying to run an application that prints "Hello World!". The script works fine locally, but every time I run it with :load /path/to/script output: Loading /u/hdpdlcu/Matt/test/SparkScalaCourse/src/com/sundogsoftware/spark/test1.scala... …
Matt
  • 113
  • 1
  • 1
  • 5
-1
votes
1 answer

Row vs List in spark-shell

What is the difference between Spark Row and Scala List, both provide a way to access items by Index When to use which one The only difference I can see in Row is that it has some schema. scala> val a=Row(1,"hi",2,"hello") a:…
-1
votes
1 answer

The system can't find the path specified spark-shell on windows 10

I am trying to install spark on my local. It is giving below error when running spark-shell The system can't find the path specified I have updated all environmental variables like JAVA_HOME, SPARK_HOME, PATH variables but still getting the error.
Swetha
  • 11
  • 3
-2
votes
2 answers

Merging n rows of a dataframe containing duplicate values

I have a dataframe like below Id linkedIn 1 [l1,l2] 2 [l5,l6,l3] 3 [l4,l5] 4 [l8,l10] 5 [l7,l9,l1] If we see row 1 & 5 have l1 in common so those two should be merged as one row with Id=1. Similarly row 2 & 3 have l5 in…
-4
votes
2 answers

Scala, Spark-shell, Groupby failing

I have Spark version 2.4.0 and scala version 2.11.12. I can sucessfully load a dataframe with the following code. val df =…
user204548
  • 25
  • 1
  • 5
1 2 3
8
9