Highest Voted 'apache-spark-1.6' Questions

0

votes

1 answer

How to add double quotes to the string?

I have a Json-like string like this: {cid: {ABCD[1]_TYPE, [text]: alphabets, time: 1/12/2010, author: xyz, best_chapter: 10.5} And I need to add double quotes to every string to make it look like a real Json: {"cid": {"ABCD[1]_TYPE", "[text]":…

regex scala apache-spark-1.6

asked Aug 11 '20 at 12:41

xin

135
11

0

votes

0 answers

Saving output of spark to csv in spark 1.6

Spark 1.6 scala How to save output to csv file of spark 1.6. i did something like this. myCleanData.write.mode(SaveMode.Append).csv(path="file:///filepath") but it throw error as cannot resolve symbol csv i tried like this even. for dependency …

scala apache-spark apache-spark-1.6

asked Sep 20 '19 at 06:07

Sophie Dinka

73
1
8

0

votes

1 answer

UDF in Spark 1.6 Reassignment to val error

I am using Spark 1.6 The below udf is used to clean address data. sqlContext.udf.register("cleanaddress", (AD1:String,AD2: String, AD3:String)=>Boolean = _.matches("^[a-zA-Z0-9]*$")) UDF Name : cleanaddress Three input parameter is coming from…

apache-spark apache-spark-1.6

asked Sep 19 '19 at 18:25

Sophie Dinka

73
1
8

0

votes

1 answer

How to list All Databases using HiveContext in PySpark 1.6

I am trying to list all the databases using HiveContext in Spark 1.6 but its giving me just the default database. from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext.getOrCreate() from pyspark.sql import…

apache-spark apache-spark-sql apache-spark-1.6 hivecontext

asked Jun 24 '19 at 10:47

Ashish Kumar Singh

13
1
6

0

votes

1 answer

Iterating over a grouped dataset in Spark 1.6

In an ordered dataset, I want to aggregate data until a condition is met, but grouped by a certain key. To set some context to my question I simplify my problem to the below problem statement: In spark I need to aggregate strings, grouped by key…

apache-spark apache-spark-1.6

asked Feb 14 '19 at 17:04

Havnar

2,558
7
33
62

0

votes

0 answers

Spark Program twice as slow in Spark 2.2 than Spark 1.6

We're migrating our Scala Spark programs from 1.6.3 to 2.2.0. The program in question has four parts: let's call them sections A, B, C and D. Section A parses the input (parquet files) and then caches the DF and creates a table. Then sections B, C…

scala apache-spark apache-spark-1.6 apache-spark-2.2

asked Feb 01 '19 at 19:36

Doug T

21
2

0

votes

1 answer

cast method results in null values in java spark

I have a simple use case of performing join on two dataframes, I am using spark 1.6.3 version. The issue is that while trying to cast the string type to integer type using cast method the resulting column is all null values. I have already tried all…

java apache-spark apache-spark-sql apache-spark-1.6

asked Dec 19 '18 at 16:48

humblecoder

137
1
7

0

votes

1 answer

How to split the input data into several files based of date field in pyspark?

I have a hive table with a date field in it. +----------+------+-----+ |data_field| col1| col2| +----------+------+-----+ |10/01/2018| 125| abc| |10/02/2018| 124| def| |10/03/2018| 127| ghi| |10/04/2018| 127| klm| |10/05/2018| …

csv apache-spark pyspark apache-spark-1.6

asked Oct 12 '18 at 01:02

Bob

335
1
4
16

0

votes

1 answer

How to drop duplicates considering only subset of columns?

I use Spark 1.6 and am doing inner join on two dataframes as follows: val filtergroup = metric .join(filtercndtns, Seq("aggrgn_filter_group_id"), inner) .distinct() But I keep getting duplicate values in aggrgn_filter_group_id column. Can you…

scala apache-spark-sql apache-spark-1.6

asked Jul 06 '18 at 10:54

Naveen Yadav

11
2
8

0

votes

0 answers

How to define partitions to Dataframe in pyspark?

Suppose I read a parquet file as a Dataframe in pyspark, how can I specify how many partitions it must be? I read the parquet file like this - df = sqlContext.read.format('parquet').load('/path/to/file') How may I specify the number of partitions…

dataframe pyspark data-partitioning apache-spark-1.6

asked May 13 '18 at 07:45

Ani Menon

27,209
16
105
126

0

votes

1 answer

Pyspark- handling exceptions and raising RuntimeError in pyspark dataframe

I have a dataframe in which i'm trying to create a new column based on values of existing column: dfg = dfg.withColumn("min_time", F.when(dfg['list'].isin(["A","B"]),dfg['b_time']) .when(dfg['list']=="C",dfg['b_time'] +2) …

apache-spark pyspark apache-spark-sql apache-spark-1.6

asked Jan 31 '18 at 20:31

Mia21

119
2
10

0

votes

2 answers

calculate median, average using hadoop spark1.6 dataframe, Failed to start database 'metastore_db'

spark-shell --packages com.databricks:spark-csv_2.11:1.2.0 1. using SQLContext ~~~~~~~~~~~~~~~~~~~~ 1. import org.apache.spark.sql.SQLContext 2. val sqlctx = new SQLContext(sc) 3. import sqlctx._ val df =…

apache-spark-sql hadoop2 median hivecontext apache-spark-1.6

asked Jan 10 '18 at 06:57

Chandan Kumar Behera

1
2

0

votes

0 answers

Apache Toree 0.1.x - NoSuchMethodError: org.apache.spark.repl.SparkIMain.classServerUri()

I have created a Scala kernel for my Jupyter notebook using Spark 1.6 on CDH 5.12. I am using Apache Toree 0.1.x. I have installed the python package toree 0.1.0 (https://pypi.python.org/pypi/toree/0.1.0). And the kernel was installed with the…

scala jupyter-notebook cloudera-cdh apache-spark-1.6 apache-toree

asked Dec 20 '17 at 06:56

Buddhika H. Kasthuriarachchy

1
1
3

0

votes

1 answer

Spark 1.6 Streaming consumer reading in kafka offset stuck at createDirectStream

I am trying to read in the spark streaming offset into my consumer but I cannot seem to do it correctly. Here is my code. val dfoffset = hiveContext.sql(s"select * from $db") dfoffset.show() val dfoffsetArray = dfoffset.collect() println("printing…

scala apache-kafka spark-streaming apache-spark-1.6

asked Dec 14 '17 at 20:23

javadev

277
3
19

0

votes

2 answers

Error accessing Spark thrift Server

Spark version: 1.6.3 I running Spark thrift server as proxy. But it not running as long as I expected. It always stop when get high load. This is Error when I access.

hadoop apache-spark apache-spark-1.6 spark-thriftserver

asked Dec 05 '17 at 02:48

Mercury Trivival

11
3

Questions tagged [apache-spark-1.6]