Highest Voted 'apache-spark-2.0' Questions

0

votes

0 answers

Table not found error after submitting a spark script consisting with spark sql after enabling hivesupport

I want to run a simple spark script which has some sparksql query basicaly Hiveql. the corresponding tables are saved in spark-warehouse folder. from pyspark.sql import SparkSession from pyspark.sql import…

pyspark apache-spark-sql apache-spark-2.0

asked Aug 15 '17 at 07:23

Kalyan

1,880
11
35
62

0

votes

3 answers

Spark doesn't read columns with null values in first row

Below is the content in my csv file : A1,B1,C1 A2,B2,C2,D1 A3,B3,C3,D2,E1 A4,B4,C4,D3 A5,B5,C5,,E2 So, there are 5 columns but only 3 values in the first row. I read it using the following command : val csvDF : DataFrame =…

apache-spark apache-spark-sql apache-spark-2.0 spark-csv

asked Aug 10 '17 at 06:18

Sorabh Kumar

176
1
4
14

0

votes

2 answers

How to write data into a Hive table?

I use Spark 2.0.2. While learning the concept of writing a dataset to a Hive table, I understood that we do it in two ways: using sparkSession.sql("your sql query") dataframe.write.mode(SaveMode."type of mode").insertInto("tableName") Could anyone…

apache-spark hive apache-spark-2.0

asked Aug 09 '17 at 06:54

Metadata

2,127
9
56
127

0

votes

1 answer

How todo an incremental sequence with java in Apache-Spark 2.x

How TODO an incremental sequence with java in Apache-Spark 2.x DataFrame || TempTable . In other words what is the equivalent for monotonically_increasing_id() function in ApacheSpark->Sql->java || API->Java

java apache-spark-sql apache-spark-2.0

asked Aug 08 '17 at 11:55

Yugerten

878
1
11
30

0

votes

2 answers

How to sort RDD entries using two features simultaneously?

I have a Spark RDD whose entries I want to sort in an organized manner. Let's say the entry is a tuple with 3 elements (name,phonenumber,timestamp). I want to sort the entries first depending on the value of phonenumber and then depending on the…

scala apache-spark rdd apache-spark-2.0

asked Jul 31 '17 at 13:25

Mnemosyne

1,162
4
13
45

0

votes

1 answer

Package GraphFrames Spark2.0

I have spark 2.0 Scala 2.11.8 and I am trying to include graph frames package. I typed the following in the scala shell: But still I got the error message: scala> import…

apache-spark-2.0 scala-2.11 graphframes

asked Jul 16 '17 at 22:40

user2507238

51
3
8

0

votes

2 answers

How to set ignoreNulls flag for first function in agg with map of columns and aggregate functions?

I have around 20-25 list of columns from conf file and have to aggregate first Notnull value. I tried the function to pass the column list and agg expr from reading the conf file. I was able to get first function but couldn't find how to specify…

scala apache-spark apache-spark-sql apache-spark-2.0

asked Jul 14 '17 at 19:56

Shiva Achari

955
1
9
18

0

votes

1 answer

Spark - Scala: Parsing and extracting a document which has both Text and Image - .doc, .docx files

I have few files (doc,docx files) which contains both Image and Text. I would like to parse these files and extract the contents,with or without Image details. Currently I am using Apache Tika which refuses to parse such files. its working perfectly…

scala apache-spark-2.0

asked Jul 08 '17 at 07:32

Sija Balakrishnan

1
5

0

votes

0 answers

Zeppelin:run more than one paragraph

I am trying to use zeppelin to plot a realtime graph. for that I'm developing structured streaming DataFrame with spark-highcharts (spark 2.1.0, zeppelin 0.7) following this example:…

javascript scala highcharts apache-zeppelin apache-spark-2.0

asked Jul 01 '17 at 14:19

Ibrahim Mousa

61
4

0

votes

2 answers

Need a specific function in spark which will check whether all elements matches given predicate or not?

I need a function on RDD, let's say 'isAllMatched' which will take a predicate as an argument to match. However, I don't want to scan all elements, if predicate fails for any element, it should return false. I also want this function to execute…

apache-spark apache-spark-sql apache-spark-2.0

asked Jun 19 '17 at 05:08

aks

1,019
1
9
17

0

votes

0 answers

rdd.saveAsTextFile("path"), it return without error but unable to find the output file

I am executing below statement: item_final_view_cassandra_df. map({case Row(item_id: Long, account_id: Long, ssin_id: String, gu_id: String, modified_id: Long ) => (item_id, account_id, ssin_id, gu_id, modified_id)}) …

apache-spark apache-spark-sql apache-spark-2.0

asked Jun 12 '17 at 10:12

Nithin Gangadharan

527
4
9

0

votes

1 answer

how to display output of spark java application in UI

I have a Spark Java application for log mining. Currently I am reading the output from spark output files and displaying it in Excel Sheet. But I want a better UI. Can somebody help me to code a better UI for easier and better way to analyze the…

apache-spark apache-spark-mllib apache-spark-2.0

asked Jun 07 '17 at 04:57

Menaga

105
1
3
10

0

votes

0 answers

Spark 2.0: Exception on self joining temporary tables

I have faced an interesting problem while using Spark 2.0. Here is my situation: create a temporary view V1 using sql create a temporary view V2 using self join of V1 select a.*, b.bcol3 from ( select col1, col2, …

apache-spark apache-spark-sql self-join apache-spark-2.0

asked Jun 03 '17 at 00:08

Luniam

463
7
21

0

votes

1 answer

How to run spark-jobs outside the bin folder of spark-2.1.1-bin-hadoop2.7

I have an existing spark-job, the functionality of this spark-job is to connect kafka-server get the data and then storing the data into cassandra tables, now this spark-job is running on server inside spark-2.1.1-bin-hadoop2.7/bin but whenever I am…

maven apache-spark apache-spark-2.0

asked May 22 '17 at 11:38

Sat

3,520
9
39
66

0

votes

1 answer

Apache Spark Graphx - Java Implementation

As per Spark Documentation, it seems GraphX does not have Java API available yet. IS my assumption correct? If yes then can somebody provide some example where GraphX library is called using Java Code?

apache-spark spark-graphx apache-spark-2.0

asked May 20 '17 at 10:09

Sourav Gulati

1,359
9
18

Questions tagged [apache-spark-2.0]