Questions tagged [stocator]

Stocator is high performing connector to object storage for Apache Spark, achieving performance by leveraging object store semantics.

Stocator is high performing connector to object storage for Apache Spark, achieving performance by leveraging object store semantics.

https://github.com/SparkTC/stocator

10 questions
3
votes
5 answers

No FileSystem for scheme: cos

I'm trying to connect to IBM Cloud Object Storage from IBM Data Science Experience: access_key = 'XXX' secret_key = 'XXX' bucket = 'mybucket' host = 'lon.ibmselect.objstor.com' service = 'mycos' sqlCxt = SQLContext(sc) hconf =…
Chris Snow
  • 23,813
  • 35
  • 144
  • 309
2
votes
0 answers

How to write parquet to minio from spark?

We've got some code that creates and uses a local spark and writes parquet files to S3. It works with both Amazon S3 and IBM Cloud Object Storage. But when I stand up a minIO container and point the code there, it fails with an error like…
lmsurprenant
  • 1,723
  • 2
  • 14
  • 28
2
votes
1 answer

java.io.FileNotFoundException: Not found cos://mybucket.myservicename/checkpoint/offsets

I'm trying to use Spark Structured Streaming 2.3 to read data from Kafka (IBM Message Hub) and save it into IBM Cloud Object Storage on a 1.1 IBM Analytics Engine Cluster. After creating the cluster, ssh into it: $ ssh…
Chris Snow
  • 23,813
  • 35
  • 144
  • 309
2
votes
1 answer

What is configuartion required to get data from object storage by SWIFT in Spark

I go through document but still it is very much confusing how to get data from swift. I configured swift in my one linux machine. By using below command I am able to get container list, swift -A https://acc.objectstorage.softlayer.net/auth/v1.0/…
Vimal Dhaduk
  • 994
  • 2
  • 18
  • 43
1
vote
4 answers

java.lang.AbstractMethodError: com/ibm/stocator/fs/common/IStoreClient.setStocatorPath(Lcom/ibm/stocator/fs/common/StocatorPath;)V

I'm trying to access data on IBM COS from Data Science Experience based on this blog post. First, I select 1.0.8 version of stocator ... !pip install --user --upgrade pixiedust import…
Chris Snow
  • 23,813
  • 35
  • 144
  • 309
0
votes
0 answers

Is it safe to read data with boto3 from S3 if that data had been written using Stocator in pyspark?

I have an application that uses Stocator as a connector for Spark. This application writes the data to the S3 cos bucket. Now I am working on a service that's supposed to read that data from S3. According to this thread here, you cannot specify the…
0
votes
1 answer

Spark-submit with Stocator failing with Class com.ibm.stocator.fs.ObjectStoreFileSystem not found error

I am trying to run spark-submit wordcount Python on a Kubernetes cluster by pulling a text file stored in COS. For the config, I followed the Stocator README.md ./bin/spark-submit \ --master…
0
votes
2 answers

How to use stocator from IBM Jupyter notebook running pyspark?

I want to use stocator to access IBM cloud storage from a Jupyter notebook (on IBM Watson Studio) running pyspark. Can someone please tell me how to go about this? I understand that stocator is pre-installed but do you have to put in credentials or…
0
votes
2 answers

Spark write stream to IBM Cloud object storage failing with "Access KEY is empty. Please provide valid access key"

I am currently using Apache Spark 2.3.2 and creating a pipeline to read stream csv files from a file system and then write stream it to IBM Cloud object storage. I am using Stocator connector for this. The regular read and writes to IBM COS is…
0
votes
1 answer

How to configure Stocator on Amazon EMR

I am trying to configure Stocator on an Amazon EMR cluster to access data on Amazon s3. I have found resources that indicate that this should be possible, but very little detail on how to get this to work. When I start my EMR cluster I use the…
roblovelock
  • 1,971
  • 2
  • 23
  • 41