1

Is there a possibility to read data from IBM GPFS (Global Parallel Filesystem) in Apache Spark ?

My intention is to use something like this

sc.textFile("gfps://...")

instead of

sc.textFile("hdfs://...")

The environment that is intended to be used is the Hortonworks Data Platform. I've read some articles, deploying IBM Spectrum Scale File System that says you can configure on HDP, a connector to GPFS that will give you the ability to read/write to GPFS (maybe something the MAPR-FS has for it's file system). Have anyone done this ?

Thanks

dumitru
  • 2,068
  • 14
  • 23
  • you can use gpfs as local file `file:///"` or you can pass the mount point direct without `hdfs:// or `gpfs://` in our platform we use forexample `BINS/FILESOURCE` direct – Moustafa Mahmoud Nov 19 '17 at 14:22

1 Answers1

0

@dumitru You can use Sparkling.data library.

More details - http://datascience.ibm.com/blog/making-data-useful-with-the-sparkling-data-library-2/

user3294904
  • 444
  • 8
  • 26