Questions tagged [apache-spark-3.0]

27 questions
0
votes
0 answers

Migrating from Spark 2.4 to Spark 3. What's Spark 2.4's SharedSQLContext equivalent in Spark 3?

I'm fairly new to java/scala. I'm unable to find SharedSQLContext in Spark 3 repo. How do we generally find the class equivalent in more updated versions? I couldn't find any documentation on this. Thank you! Sample existing class: class…
sojim2
  • 1,245
  • 2
  • 15
  • 38
0
votes
0 answers

Fail to read an HBase table with Java Spark 3.2.3

I have a Java Spark application, in which I need to read all the row keys from an HBase table. Up until now, I worked with Spark 2.4.7 and we migrated to Spark 3.2.3. I used newAPIHadoopRDD but HBase is returning an empty result after the Spark…
Oded
  • 336
  • 1
  • 3
  • 17
0
votes
1 answer

Configure Spark 3 thrift server with Apache Ranger

I am trying to configure Spark 3.3.0 Thrift Server with Apache Ranger but I cannot find any resources or information for this setup.Any suggestions on how to implement this? Thanks very much! I already have an STS (kerberos jdbc) turned on and…
0
votes
0 answers

What is the alternative to com.stratio.receiver.spark-rabbitmq for Spark 3?

I have a spark streaming application and want to upgrade it to Spark 3 from 2. It is consuming messages from RabbitMQ using com.stratio.receiver.spark-rabbitmq version 0.5.1. But this library is not available for Spark 3. Is there any alternatives…
ZMI
  • 1
  • 1
0
votes
1 answer

Apache spark: asc not working as expected

I have following code: df.orderBy(expr("COUNTRY_NAME").desc, expr("count").asc).show() I expect count column to be arranged in ascending order for a given COUNTRY_NAME. But I see something like this: Last value of 12 is not as per the…
Mandroid
  • 6,200
  • 12
  • 64
  • 134
0
votes
0 answers

spark 3.0: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO while writing to a table

I am trying to write a dataframe to a table: spark.sql("CREATE DATABASE IF NOT EXISTS my_db") spark.catalog.setCurrentDatabase("my_db") dataFrame.write .format("csv") .mode(SaveMode.Overwrite) .bucketBy(5, "NAME", "DEPT") …
Mandroid
  • 6,200
  • 12
  • 64
  • 134
0
votes
0 answers

Cannot import commons-dbutils in sbt

I tried adding the common-dbutils dependency to my project using sbt by adding the below line to the build.sbt file. libraryDependencies += "commons-dbutils" % "commons-dbutils" % "1.6" I didn't get any error as well. Looking at the dependency tree…
0
votes
0 answers

Spark REST API to list running and stopped queries

I am exploring the spark rest API for structured streaming. I have looked the all exposed rest API available in below link. https://spark.apache.org/docs/latest/monitoring.html however, I could not figure out how to get the list of "Active Streaming…
Monu
  • 2,092
  • 3
  • 13
  • 26
0
votes
1 answer

Spark can't connect to DB with built-in connection providers

I'm trying to connect to Postgres follow this document And the document said built-in connection providers. Can anyone help me resolve this, please? ` There is a built-in connection providers for the following databases: DB2 MariaDB MS…
MasterLuV
  • 396
  • 1
  • 17
0
votes
0 answers

Apache Phoenix - Count query returns more than 100k rows, but SELECT query does not return any row

Using Apache Spark 3, I manipulated some CSV data, stored in a dataframe, with the intention of sending it to HBase. The data is successfully sent using JavaHBaseContext's bulkPut() method. However, in Apache Phoenix, using a plain SELECT query, I…
0
votes
1 answer

ServiceConfigurationError running spark 3.2

I am trying to update code written with spark 2.4 and doing some tests with spark 3.2. I am able to create a spark session: spark = ( SparkSession.builder .config('spark.jars.packages',…
DatGuy
  • 377
  • 1
  • 4
  • 10
0
votes
0 answers

Why Apache Spark does some checks and raises those exceptions during the job runtime, but has never thrown them during Unit test?

There was a bug in my Scala code, formatting the date of the timestamp, being then concatenated as the String to some, non-timestamp column of the Spark Streaming: concat(date_format(col("timestamp"),"yyyy-MM-DD'T'HH:mm:ss.SSS'Z'") So, during the…
1
2