0

When I try to use pyspark read clickhouse table, there exist array type column which raise me 'Unspoort ARRAY TYPE', then I tried to register the ClickHouseDialect to solve the issue, and py4J.protocol.Py4JError: org.apache.sql.jdbc.ClickhouseDialect._get_object_id does not exist in JVM, here is my code and the spark version is 2.4

from pyspark import SparkContext, SparkConf, SQLContext
from pyspark.sql import SparkSession
import pyspark.sql.functions as f
import pyspark.sql.types as dtype
from pygeohash import encode
from py4j.java_gateway import java_import

spark = SparkSession.builder.master('yarn').appName('appName').enableHiveSupport().getOrCreate()
scgw = spark.sparkContext._gateway
java_import(scgw.jvm, "org.apache.spark.sql.jdbc.JdbcDialects")
java_import(scgw.jvm, "org.apache.spark.sql.jdbc.ClickHouseDialect")
scgw.jvm.org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(scgw.jvm.org.apache.spark.sql.jdbc.ClickHouseDialect)

properties = {'driver': 'com.github.housepower.jdbc.ClickHouseDriver',
              'user': 'user',
              'password': 'passd',
              'isolationLevel': 'NONE'}

test_df = spark.read.jdbc(url='jdbc:clickhouse://10.88.15.51:9000',
                          table='source.latest_condition',
                          properties=properties)

Hope to get some idea from you, thanks

I use clickhouse-native-jdbc-shaded-2.4.3.jar and clickhouse-intergration-spark_2.11-2.4.3.jar

TurboCC
  • 11
  • 1
  • 3

0 Answers0