HDFS-GPFS connector for using in Apache Spark

Question

Is there a possibility to read data from IBM GPFS (Global Parallel Filesystem) in Apache Spark ?

My intention is to use something like this

sc.textFile("gfps://...")

instead of

sc.textFile("hdfs://...")

The environment that is intended to be used is the Hortonworks Data Platform. I've read some articles, deploying IBM Spectrum Scale File System that says you can configure on HDP, a connector to GPFS that will give you the ability to read/write to GPFS (maybe something the MAPR-FS has for it's file system). Have anyone done this ?

Thanks

you can use gpfs as local file `file:///"` or you can pass the mount point direct without `hdfs:// or `gpfs://` in our platform we use forexample `BINS/FILESOURCE` direct — Moustafa Mahmoud, Nov 19 '17 at 14:22

score 0 · Answer 1 · answered Nov 06 '17 at 18:04

0

@dumitru You can use Sparkling.data library.

More details - http://datascience.ibm.com/blog/making-data-useful-with-the-sparkling-data-library-2/

answered Nov 06 '17 at 18:04

user3294904

444
8
26

HDFS-GPFS connector for using in Apache Spark

1 Answers1