2

formerly CouchDB was supported via the cloudant connector:

https://github.com/cloudant-labs/spark-cloudant

But this project states that it is no longer active and that it moved to Apache Bahir:

http://bahir.apache.org/docs/spark/2.1.1/spark-sql-cloudant/

So I've installed the JAR in a Scala notebook using the following command:

%AddJar http://central.maven.org/maven2/org/apache/bahir/spark-sql-cloudant_2.11/2.1.1/spark-sql-cloudant_2.11-2.1.1.jar

Then, from a python notebook, after restarting the kernel, I use the following code to test:

spark = SparkSession\
    .builder\
    .appName("Cloudant Spark SQL Example in Python using dataframes")\
    .config("cloudant.host","0495289b-1beb-4e6d-888e-315f36925447-bluemix.cloudant.com")\
    .config("cloudant.username", "0495289b-1beb-4e6d-888e-315f36925447-bluemix")\
    .config("cloudant.password","xxx")\
    .config("jsonstore.rdd.partitions", 8)\
    .getOrCreate()

# ***1. Loading dataframe from Cloudant db
df = spark.read.load("openspace", "org.apache.bahir.cloudant")
df.cache()
df.printSchema()
df.show()

But I get:

java.lang.ClassNotFoundException: org.apache.bahir.cloudant.DefaultSource

(gist of full log)

Romeo Kienzler
  • 3,373
  • 3
  • 36
  • 58

2 Answers2

3

There is one workaround, it should run in all sorts of jupyther notebook environments and is not exclusive to IBM DataScience Experience:

!pip install --upgrade pixiedust

import pixiedust

pixiedust.installPackage("cloudant-labs:spark-cloudant:2.0.0-s_2.11")

This is of course a workaround, will post the official answer once awailable

EDIT:

Don't forget the restart the jupyter kernel afterwards

EDIT 24.12.18: Created a yt video on this without workaround, see comments...will update this post as well at a later stage...

Romeo Kienzler
  • 3,373
  • 3
  • 36
  • 58
  • any updates on real solution on spark/couchdb connectors? – Grant Rostig Dec 22 '18 at 22:48
  • Please see https://www.youtube.com/watch?v=dCawUGv7qgs - there this workaround is not needed anymore, will update the post as well at a later stage... – Romeo Kienzler Dec 24 '18 at 06:24
  • for me the connection is not working, getting Error retrieving server response at https://null/ error. Any idea on this? – ss301 Jun 07 '21 at 13:06
1

Another workaround below. It has been tested and works in DSX Python notebooks:

import pixiedust

# Use play-json version 2.5.9. Latest version is not supported at this time.
pixiedust.installPackage("com.typesafe.play:play-json_2.11:2.5.9")
# Get the latest sql-cloudant library
pixiedust.installPackage("org.apache.bahir:spark-sql-cloudant_2.11:0")

spark = SparkSession\
  .builder\
  .appName("Cloudant Spark SQL Example in Python using dataframes")\
  .config("cloudant.host", host)\
  .config("cloudant.username", username)\
  .config("cloudant.password", password)\
  .getOrCreate()

df = spark.read.load(format="org.apache.bahir.cloudant", database="MY-DB")
elaver
  • 141
  • 3