Use for questions specific to Apache Spark 2.0. For general questions related to Apache Spark use the tag [apache-spark].
Questions tagged [apache-spark-2.0]
464 questions
0
votes
0 answers
Table not found error after submitting a spark script consisting with spark sql after enabling hivesupport
I want to run a simple spark script which has some sparksql query basicaly Hiveql. the corresponding tables are saved in spark-warehouse folder.
from pyspark.sql import SparkSession
from pyspark.sql import…

Kalyan
- 1,880
- 11
- 35
- 62
0
votes
3 answers
Spark doesn't read columns with null values in first row
Below is the content in my csv file :
A1,B1,C1
A2,B2,C2,D1
A3,B3,C3,D2,E1
A4,B4,C4,D3
A5,B5,C5,,E2
So, there are 5 columns but only 3 values in the first row.
I read it using the following command :
val csvDF : DataFrame =…

Sorabh Kumar
- 176
- 1
- 4
- 14
0
votes
2 answers
How to write data into a Hive table?
I use Spark 2.0.2.
While learning the concept of writing a dataset to a Hive table, I understood that we do it in two ways:
using sparkSession.sql("your sql query")
dataframe.write.mode(SaveMode."type of
mode").insertInto("tableName")
Could anyone…

Metadata
- 2,127
- 9
- 56
- 127
0
votes
1 answer
How todo an incremental sequence with java in Apache-Spark 2.x
How TODO an incremental sequence with java in Apache-Spark 2.x DataFrame || TempTable .
In other words what is the equivalent for monotonically_increasing_id() function in ApacheSpark->Sql->java || API->Java

Yugerten
- 878
- 1
- 11
- 30
0
votes
2 answers
How to sort RDD entries using two features simultaneously?
I have a Spark RDD whose entries I want to sort in an organized manner. Let's say the entry is a tuple with 3 elements (name,phonenumber,timestamp). I want to sort the entries first depending on the value of phonenumber and then depending on the…

Mnemosyne
- 1,162
- 4
- 13
- 45
0
votes
1 answer
Package GraphFrames Spark2.0
I have spark 2.0 Scala 2.11.8 and I am trying to include graph frames package.
I typed the following in the scala shell:
But still I got the error message:
scala> import…

user2507238
- 51
- 3
- 8
0
votes
2 answers
How to set ignoreNulls flag for first function in agg with map of columns and aggregate functions?
I have around 20-25 list of columns from conf file and have to aggregate first Notnull value. I tried the function to pass the column list and agg expr from reading the conf file.
I was able to get first function but couldn't find how to specify…

Shiva Achari
- 955
- 1
- 9
- 18
0
votes
1 answer
Spark - Scala: Parsing and extracting a document which has both Text and Image - .doc, .docx files
I have few files (doc,docx files) which contains both Image and Text. I would like to parse these files and extract the contents,with or without Image details.
Currently I am using Apache Tika which refuses to parse such files. its working perfectly…
0
votes
0 answers
Zeppelin:run more than one paragraph
I am trying to use zeppelin to plot a realtime graph. for that I'm developing structured streaming DataFrame with spark-highcharts (spark 2.1.0, zeppelin 0.7) following this example:…

Ibrahim Mousa
- 61
- 4
0
votes
2 answers
Need a specific function in spark which will check whether all elements matches given predicate or not?
I need a function on RDD, let's say 'isAllMatched' which will take a predicate as an argument to match. However, I don't want to scan all elements, if predicate fails for any element, it should return false. I also want this function to execute…

aks
- 1,019
- 1
- 9
- 17
0
votes
0 answers
rdd.saveAsTextFile("path"), it return without error but unable to find the output file
I am executing below statement:
item_final_view_cassandra_df.
map({case Row(item_id: Long, account_id: Long, ssin_id: String,
gu_id: String, modified_id: Long ) =>
(item_id, account_id, ssin_id, gu_id, modified_id)}) …

Nithin Gangadharan
- 527
- 4
- 9
0
votes
1 answer
how to display output of spark java application in UI
I have a Spark Java application for log mining. Currently I am reading the output from spark output files and displaying it in Excel Sheet. But I want a better UI. Can somebody help me to code a better UI for easier and better way to analyze the…

Menaga
- 105
- 1
- 3
- 10
0
votes
0 answers
Spark 2.0: Exception on self joining temporary tables
I have faced an interesting problem while using Spark 2.0. Here is my situation:
create a temporary view V1 using sql
create a temporary view V2 using self join of V1
select
a.*,
b.bcol3
from
(
select
col1,
col2,
…

Luniam
- 463
- 7
- 21
0
votes
1 answer
How to run spark-jobs outside the bin folder of spark-2.1.1-bin-hadoop2.7
I have an existing spark-job, the functionality of this spark-job is to connect kafka-server get the data and then storing the data into cassandra tables, now this spark-job is running on server inside spark-2.1.1-bin-hadoop2.7/bin but whenever I am…

Sat
- 3,520
- 9
- 39
- 66
0
votes
1 answer
Apache Spark Graphx - Java Implementation
As per Spark Documentation, it seems GraphX does not have Java API available yet.
IS my assumption correct? If yes then can somebody provide some example where GraphX library is called using Java Code?

Sourav Gulati
- 1,359
- 9
- 18