Highest Voted 'apache-spark-2.3' Questions

0

votes

0 answers

use corelated subquery in pyspark sql

Tab1 Columns [F,S,E] F1 S1 R F1 S2 R2 F1 S3 R1 F2 S1 R2 F2 S4 R4 F1 S4 R Tab2 Columns [F,S] F1 S1 F1 S3 F2 S1 F2 S4 TAKE ROWS FROM TAB1 FOR ONLY IF F->S RELATION IS PRESENT IN Tab2 RESULT Columns [F,S,E] F1 S1 R F1 S3 R F2 S4 R4 I have the query…

asked Sep 04 '19 at 17:56

saahil shah

1
3

0

votes

2 answers

Read specific file from multiple .gz file in Spark

java apache-spark apache-spark-sql apache-spark-2.3

asked Aug 30 '19 at 15:53

Neeleshkumar S

746
11
19

0

votes

1 answer

create new column in pyspark dataframe using existing columns

I am trying to work with pyspark dataframes and I would like to know how I can create and populate new column using existing columns. Lets say I have a dataframe that looks like this: +-----+---+---+ | _1| _2| _3| +-----+---+---+ |x1-y1| 3|…

python-2.7 pyspark apache-spark-sql apache-spark-2.3

asked Mar 15 '19 at 05:35

Shashank BR

65
1
6

0

votes

1 answer

Repartitioning a pyspark dataframe fails and how to avoid the initial partition size

I'm trying to tune the performance of spark, by the use of partitioning on a spark dataframe. Here is the code: file_path1 = spark.read.parquet(*paths[:15]) df = file_path1.select(columns) \ .where((func.col("organization") == organization)) df…

python pyspark apache-spark-sql apache-spark-2.3

asked Feb 25 '19 at 11:51

SarahData

769
1
12
38

0

votes

1 answer

Casting string like "[1, 2, 3]" to array

Pretty straightforward. I have an array-like column encoded as a string (varchar) and want to cast it to array (so I can then explode it and manipulate the elements in "long" format). The two most natural approaches don't seem to work: -- just…

pyspark apache-spark-sql apache-spark-2.3

asked Feb 24 '19 at 07:11

MichaelChirico

33,841
14
113
198

0

votes

1 answer

Unable to query/select data those inserted through Spark SQL

I am trying to insert data into a Hive Managed table that has a partition. Show create table output for reference. +--------------------------------------------------------------------------------------------------+--+ | …

hadoop hive apache-spark-sql azure-hdinsight apache-spark-2.3

asked Feb 12 '19 at 10:04

rajusem

79
7

0

votes

1 answer

How to build zeppelin 0.8.0 with spark 2.3.2 inbuilt

I want build zeppelin 0.8.0 with spark 2.3.2 inbuilt and run it against the same version of spark running not locally without setting SPARK_HOME so that I do not require to have a SPARK installation in the zeppelin node. I have tried the build…

apache-zeppelin apache-spark-2.3

asked Jan 16 '19 at 00:25

AlphaWolf

319
1
3
12

0

votes

1 answer

Sharing data across executors in Apache spark

My SPARK project (written in Java) requires to access (SELECT query results) different tables across executors. One solution to this problem is : I create a tempView select required columns using forEach convert DataFrame to Map. pass that map as…

java apache-spark apache-spark-dataset apache-spark-2.0 apache-spark-2.3

asked Dec 18 '18 at 04:51

A Learner

157
1
5
16

-1

votes

1 answer

Spark shuffle disk spill increase when upgrading versions

When upgrading from spark 2.3 to spark 2.4.3, I saw a 20-30% increase in the amount of shuffle disk spill one of my stages generated. The same code is being executed in both environments. All configurations are identical between both environments

apache-spark pyspark apache-spark-2.3

asked Jul 17 '19 at 13:19

Barak Freiman

21
3

Questions tagged [apache-spark-2.3]