Indexing with Solr-spark and Alluxio : cannot acces files in Alluxio

Question

I am indexing documents to solr using java. My code works perfectly when I index files that are in my computer. But when I try to index files that are located in alluxio I have an exception "No fileSystem for scheme: alluxio". I have added alluxio dependencies in my pom.

Here is the code:

public class SparkTestMain {

 public static void main(String[] args) {

     


            new SparkRead().loadDocuments(
                    "alluxio://XXX.XXX.XXX.XX:19998/**/"       );

 }

}

In SparkRead I do the indexing from the filepath : JavaRDD documents = sc.textFile(pathToFile) here, pathToFile = "alluxio://XXX.XXX.XXX.XX:19998/**/"

Here is the error:

Exception in thread "main" java.io.IOException: No FileSystem for scheme: http
 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2579)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2586)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
 at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
 at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
 at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
 at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
 ...

score 0 · Answer 1 · edited Nov 13 '18 at 06:50

Here is the doc from Alluxio project website explaining why you might see "No fileSystem for scheme: alluxio".

A: This error message is seen when your applications (e.g., MapReduce, Spark) try to access Alluxio as an HDFS-compatible file system, but the alluxio:// scheme is not recognized by the application. Please make sure your HDFS configuration file core-site.xml (in your default hadoop installation or spark/conf/ if you customize this file for Spark) has the following property:
<configuration>
  <property>
    <name>fs.alluxio.impl</name>
    <value>alluxio.hadoop.FileSystem</value>
  </property>
</configuration>

In your posted error messages, I did see "No FileSystem for scheme: http" rather than "No fileSystem for scheme: alluxio". Is "http" a typo?

Indexing with Solr-spark and Alluxio : cannot acces files in Alluxio

1 Answers1