I am indexing documents to solr using java. My code works perfectly when I index files that are in my computer. But when I try to index files that are located in alluxio I have an exception "No fileSystem for scheme: alluxio". I have added alluxio dependencies in my pom.
Here is the code:
public class SparkTestMain {
public static void main(String[] args) {
new SparkRead().loadDocuments(
"alluxio://XXX.XXX.XXX.XX:19998/**/" );
}
}
In SparkRead I do the indexing from the filepath : JavaRDD documents = sc.textFile(pathToFile) here, pathToFile = "alluxio://XXX.XXX.XXX.XX:19998/**/"
Here is the error:
Exception in thread "main" java.io.IOException: No FileSystem for scheme: http
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2579)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2586)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
...