I've been following the 5 min how to for setting up an htap databse with tidb_tispark and everything works until I get to the section Launch TiSpark. My first issue occurs when executing the line:
docker-compose exec tispark-master /opt/spark-2.1.1-bin-hadoop2.7/bin/spark-shell
But I got around that by modifying the spark version to the version I found inside the container:
docker-compose exec tispark-master /opt/spark-2.3.3-bin-hadoop2.7/bin/spark-shell
My second issue occurs when executing the three line block:
import org.apache.spark.sql.TiContext
val ti = new TiContext(spark)
ti.tidbMapDatabase("TPCH_001")
When I run the last statement I get the following output
scala> ti.tidbMapDatabase("TPCH_001")
2019-07-11 16:14:32 WARN General:96 - Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.3.3-bin-hadoop2.7/jars/datanucleus-core-3.2.10.jar."
2019-07-11 16:14:32 WARN General:96 - Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.3.3-bin-hadoop2.7/jars/datanucleus-api-jdo-3.2.6.jar."
2019-07-11 16:14:32 WARN General:96 - Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.3.3-bin-hadoop2.7/jars/datanucleus-rdbms-3.2.9.jar."
2019-07-11 16:14:36 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
This doesn't prevent me from running the query:
spark.sql("select * from nation").show(30);
But when I follow the further steps of the tutorial to modify the db from MySQL, the changes are not reflected immediately in Spark. Furthermore, at some point in the future (I believe > 5 minutes later), the row that was modified stops showing up in Spark SQL queries.
I'm rather new to this kind of setup and don't really know how to debug this issue. Searches for the warnings I received weren't illuminating.
I don't know if it's helpful but when I connect MySQL this is the server version I get:
Server version: 5.7.25-TiDB-v3.0.0-rc.1-309-g8c20289c7 MySQL Community Server (Apache License 2.0)