3

Is there any way to connect pyspark (Sparks API using python) to dynamodb. For mongodb and cassandra there are connectors to interface with pyspark. It would seem possible to do if dynamo could act as a Hadoop Input/Output.

https://github.com/mongodb/mongo-hadoop/blob/master/spark/src/main/python/README.rst

Any leads would be greatly appreciated.

rabz100
  • 751
  • 1
  • 5
  • 13
  • have you looked at boto? – maxymoo May 10 '16 at 02:18
  • Yes I have. Boto connects to dynamo but does not interface with pyspark. – rabz100 May 10 '16 at 02:22
  • what to you mean "does not interface" ... what have you tried? you should be able to run any Python code in Spark – maxymoo May 10 '16 at 03:18
  • @maxymoo What OP means is you cannot pass a boto object to Spark, operations can only take simple base classes or python-only functions that do not have any 3rd party dependencies from libraries you import. – alfredox Jan 15 '18 at 16:53
  • @alfredox you can pass libraries using the --py-files argument, see https://stackoverflow.com/questions/29495435/easiest-way-to-install-python-dependencies-on-spark-executor-nodes – maxymoo Jan 16 '18 at 21:38

0 Answers0