hadoop No FileSystem for scheme: file

Question

I am trying to run a simple NaiveBayesClassifer using hadoop, getting this error

Exception in thread "main" java.io.IOException: No FileSystem for scheme: file
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:180)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
    at org.apache.mahout.classifier.naivebayes.NaiveBayesModel.materialize(NaiveBayesModel.java:100)

Code :

    Configuration configuration = new Configuration();
    NaiveBayesModel model = NaiveBayesModel.materialize(new Path(modelPath), configuration);// error in this line..

modelPath is pointing to NaiveBayes.bin file, and configuration object is printing - Configuration: core-default.xml, core-site.xml

I think its because of jars, any ideas?

Don't know myself, but a quick look on google suggests that there are some issues around jars not being referenced as you suggested. Perhaps the following links will yield an answer. https://groups.google.com/a/cloudera.org/forum/#!topic/scm-users/lyho8ptAzE0 http://grokbase.com/t/cloudera/cdh-user/134r64jm5t/no-filesystem-for-scheme-hdfs — Emile, Jun 23 '13 at 20:35
I was adding hadoop-common-2.0.0-cdh4.3.0-sources.jar and hadoop-core-0.20.2.jar to class path, I removed first and it worked dont know why. — Mahender Singh, Jun 23 '13 at 20:41
Hmm..Could you please tell me about your environment? Also, please show me the complete exception message. — Tariq, Jun 23 '13 at 20:49
Whats the value of modelPath? have you tried `file:///path/to/dir` — Chris White, Jun 24 '13 at 01:04
as @emile suggested, make sure you are running your jar via hadoop, not java. i.e. "just run the distributed jar with "hadoop jar", instead of trying to execute a standalone "java -jar"." — matthieu lieber, Feb 24 '15 at 18:12
I have used **hadoop jar test.jar** instead of **java -jar test.jar** — Manindar, Jan 31 '17 at 12:07
I copied all the jars in hadoop folder and placed where I am running the command Now everything is working fine — Shubham Kumar Gupta, Jan 17 '22 at 16:35

david_p · Answer 1 · 2020-02-18T21:04:34.777

193

This is a typical case of the maven-assembly plugin breaking things.

Why this happened to us

Different JARs (hadoop-commons for LocalFileSystem, hadoop-hdfs for DistributedFileSystem) each contain a different file called org.apache.hadoop.fs.FileSystem in their META-INFO/services directory. This file lists the canonical classnames of the filesystem implementations they want to declare (This is called a Service Provider Interface implemented via java.util.ServiceLoader, see org.apache.hadoop.FileSystem#loadFileSystems).

When we use maven-assembly-plugin, it merges all our JARs into one, and all META-INFO/services/org.apache.hadoop.fs.FileSystem overwrite each-other. Only one of these files remains (the last one that was added). In this case, the FileSystem list from hadoop-commons overwrites the list from hadoop-hdfs, so DistributedFileSystem was no longer declared.

How we fixed it

After loading the Hadoop configuration, but just before doing anything FileSystem-related, we call this:

    hadoopConfig.set("fs.hdfs.impl", 
        org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()
    );
    hadoopConfig.set("fs.file.impl",
        org.apache.hadoop.fs.LocalFileSystem.class.getName()
    );

Update: the correct fix

It has been brought to my attention by krookedking that there is a configuration-based way to make the maven-assembly use a merged version of all the FileSystem services declarations, check out his answer below.

edited Feb 18 '20 at 21:04

answered Jan 14 '14 at 16:37

david_p

5,722
1
32
26

13

Here's the equivalent code required for doing the same thing in Spark: `val hadoopConfig: Configuration = spark.hadoopConfiguration hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName) hadoopConfig.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)` – Philip O. Mar 28 '14 at 22:25
8

Actually, I just added this maven dependency `http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.2.0` to maven and problem solved. – B.Mr.W. Jul 16 '14 at 19:48
6

I have tried adding hadoop-hdfs, hadoop-core, hadoop-common, hadoop-client, Aslo tried adding hadoopConfig.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName() ); hadoopConfig.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName() ); but not working, when running from eclipse it's running fine but when running from java -cp command it shows above error – Harish Pathak Jan 08 '16 at 09:27
1

Harish, what have you seen? Same problem here but with intellij – ThommyH Mar 10 '16 at 01:53
Just an addition to the wonderful answer: if one is using the hadoop JARS but running the job in a non-hadoop cluster, """hadoopConfig.set("fs.hdfs.impl....."""" will not work. In which case we will fall back on managing the assembly build. e.g. in sbt we could do a mergeStrategy of concat or even filterDistinctLines – human Jan 25 '18 at 09:01
@david_p where we should call it ?Ifin thedriver class then when we use to see the ouput using bin/hdfs dfs -ls /somefile then what will happen / – Mandrek Jun 11 '18 at 06:08
Looks like your link is dead. Never used grepcode but it sounds like it was a great tool – alaskanloops Feb 13 '20 at 19:14
Where do you get the hadoopConfig? – markthegrea Mar 17 '20 at 19:23

score 75 · Answer 2 · answered Dec 17 '14 at 18:23

75

For those using the shade plugin, following on david_p's advice, you can merge the services in the shaded jar by adding the ServicesResourceTransformer to the plugin config:

  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>2.3</version>
    <executions>
      <execution>
        <phase>package</phase>
        <goals>
          <goal>shade</goal>
        </goals>
        <configuration>
          <transformers>
            <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
          </transformers>
        </configuration>
      </execution>
    </executions>
  </plugin>

This will merge all the org.apache.hadoop.fs.FileSystem services in one file

answered Dec 17 '14 at 18:23

krookedking

2,203
20
21

3

I like this solution best. Fix the problem at the source (the build) rather than patching it with config changes after the fact. – Kevin Pauli Sep 02 '15 at 19:11
1

Great answer. Fixed my similar error. Tried with maven-assembly-plugin as well as maven-jar-plugin/maven-dependency-plugin combination but didn't work. This solution made my Spark app work. Thanks a lot! – somnathchakrabarti Dec 10 '15 at 09:35
Great answer! Thanks a lot! – andrea.lagala Jan 19 '16 at 02:30
This should be marked as the accepted answer. The ServicesResourceTransformer is necessary for when jar files map interfaces to implementations by using a META-INF/services directory. More information can be found here: https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer – Mario May 26 '16 at 16:23
Excellent answer. – Niranjan Subramanian Apr 07 '17 at 09:59
Thank you very much, very helpful! – mwol Jun 06 '17 at 14:07
Wow! You just spared me 4 hours of head scratching! This should be the accepted answer! – Adam Arold Apr 20 '18 at 14:04
how to use this when we opted to jars instead of maven? – viswa Jun 28 '18 at 10:29
can someone translate this to sbt's build.sbt shade implementation? – Havnar Oct 15 '18 at 13:08

score 11 · Answer 3 · answered Nov 23 '16 at 13:15

11

Took me ages to figure it out with Spark 2.0.2, but here's my bit:

val sparkBuilder = SparkSession.builder
.appName("app_name")
.master("local")
// Various Params
.getOrCreate()

val hadoopConfig: Configuration = sparkBuilder.sparkContext.hadoopConfiguration

hadoopConfig.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)

hadoopConfig.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)

And the relevant parts of my build.sbt:

scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.2"

I hope this can help!

answered Nov 23 '16 at 13:15

Mauro Arnoldi

111
1
3

Been beating my head against the wall and this was the solution. Thank you! – l Steveo l Feb 10 '23 at 14:11
I was getting an error ONLY when running as an assembly jar – l Steveo l Feb 10 '23 at 17:04

score 9 · Answer 4 · answered Aug 15 '14 at 15:28

For the record, this is still happening in hadoop 2.4.0. So frustrating...

I was able to follow the instructions in this link: http://grokbase.com/t/cloudera/scm-users/1288xszz7r/no-filesystem-for-scheme-hdfs

I added the following to my core-site.xml and it worked:

<property>
   <name>fs.file.impl</name>
   <value>org.apache.hadoop.fs.LocalFileSystem</value>
   <description>The FileSystem for file: uris.</description>
</property>

<property>
   <name>fs.hdfs.impl</name>
   <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
   <description>The FileSystem for hdfs: uris.</description>
</property>

score 8 · Answer 5 · edited Sep 29 '14 at 08:07

8

thanks david_p,scala

conf.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName);
conf.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName);

or

<property>
 <name>fs.hdfs.impl</name>
 <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>

edited Sep 29 '14 at 08:07

USB

6,019
15
62
93

answered Jul 23 '14 at 07:40

Andy

337
3
3

1

Only after I read this did I realize that the *conf* here was the Hadoop Configuration: https://brucebcampbell.wordpress.com/2014/12/11/fix-hadoop-hdfs-error-java-io-ioexception-no-filesystem-for-scheme-hdfs-at-org-apache-hadoop-fs-filesystem-getfilesystemclassfilesystem-java2385/ – Sal Nov 10 '17 at 16:03

score 7 · Answer 6 · answered Sep 09 '15 at 09:59

7

For maven, just add the maven dependency for hadoop-hdfs (refer to the link below) will solve the issue.

http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.7.1

answered Sep 09 '15 at 09:59

kwky

389
3
9

score 5 · Answer 7 · answered Apr 18 '14 at 13:29

Assuming that you are using mvn and cloudera distribution of hadoop. I'm using cdh4.6 and adding these dependencies worked for me.I think you should check the versions of hadoop and mvn dependencies.

<dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-core</artifactId>
        <version>2.0.0-mr1-cdh4.6.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.0.0-cdh4.6.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.0.0-cdh4.6.0</version>
    </dependency>

don't forget to add cloudera mvn repository.

<repository>
        <id>cloudera</id>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>

score 5 · Answer 8 · answered May 22 '14 at 15:05

5

I use sbt assembly to package my project. I also meet this problem. My solution is here. Step1: add META-INF mergestrategy in your build.sbt

case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
case PathList("META-INF", ps @ _*) => MergeStrategy.first

Step2: add hadoop-hdfs lib to build.sbt

"org.apache.hadoop" % "hadoop-hdfs" % "2.4.0"

Step3: sbt clean; sbt assembly

Hope the above information can help you.

answered May 22 '14 at 15:05

Haimei

12,577
3
50
36

19

A better solution might be to merge like: `case PathList("META-INF", "services", "org.apache.hadoop.fs.FileSystem") => MergeStrategy.filterDistinctLines` This will keep all the registered filesystems – rav Mar 24 '16 at 15:35
Thanks at @ravwojdyla , pretty neat solution. You saved my hair. For the lost souls discovering this answer for Apache spark. Add this to build.sbt when sbt-assembly, works correctly. – Greedy Coder Dec 26 '16 at 09:05
The solution provided by @ravwojdyla is the only one that worked for me. – Sergey Kovalev Sep 19 '17 at 16:37
4

The solution given by @ravwojdyla is ideal. I did a similar setup in build.sbt and used: ``` assemblyMergeStrategy in assembly := { case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard case PathList("META-INF", "services", "org.apache.hadoop.fs.FileSystem") => MergeStrategy.concat case _ => MergeStrategy.first } ``` – human Jan 25 '18 at 03:22
1

@human nothing worked before i used your setup! Kudos! – ambushed Feb 21 '22 at 20:26

score 3 · Answer 9 · answered Sep 29 '19 at 11:07

I faced the same problem. I found two solutions: (1) Editing the jar file manually:

Open the jar file with WinRar (or similar tools). Go to Meta-info > services , and edit "org.apache.hadoop.fs.FileSystem" by appending:

org.apache.hadoop.fs.LocalFileSystem

(2) Changing the order of my dependencies as follow

<dependencies>
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-hdfs</artifactId>
  <version>3.2.1</version>
</dependency>

<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-common</artifactId>
  <version>3.2.1</version>
</dependency>

<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-mapreduce-client-core</artifactId>
  <version>3.2.1</version>
</dependency>

<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-client</artifactId>
  <version>3.2.1</version>
</dependency>



</dependencies>

score 3 · Answer 10 · answered Aug 23 '21 at 12:36

3

If you're using the Gradle Shadow plugin, then this is the config you have to add:

shadowJar {
    mergeServiceFiles()
}

answered Aug 23 '21 at 12:36

Joren Boulanger

111
1
8

score 2 · Answer 11 · answered Aug 31 '13 at 17:51

I assume you build sample using maven.

Please check content of the JAR you're trying to run. Especially META-INFO/services directory, file org.apache.hadoop.fs.FileSystem. There should be list of filsystem implementation classes. Check line org.apache.hadoop.hdfs.DistributedFileSystem is present in the list for HDFS and org.apache.hadoop.fs.LocalFileSystem for local file scheme.

If this is the case, you have to override referred resource during the build.

Other possibility is you simply don't have hadoop-hdfs.jar in your classpath but this has low probability. Usually if you have correct hadoop-client dependency it is not an option.

HI Roman ..i have the same issue and the META-INFO/services/org.apache.hadoop.fs.FileSystem does not have hdfs line.I have 2.0.0-mr1-cdh4.4.0 as the only dependency. What do i need to do? Any documentation about this? Using Maven to build — sethi, Jan 06 '14 at 13:24

score 2 · Answer 12 · answered Feb 10 '16 at 14:46

Another possible cause (though the OPs question doesn't itself suffer from this) is if you create a configuration instance that does not load the defaults:

Configuration config = new Configuration(false);

If you don't load the defaults then you won't get the default settings for things like the FileSystem implementations which leads to identical errors like this when trying to access HDFS. Switching to the parameterless constructor of passing in true to load defaults may resolve this.

Additionally if you are adding custom configuration locations (e.g. on the file system) to the Configuration object be careful of which overload of addResource() you use. For example if you use addResource(String) then Hadoop assumes that the string is a class path resource, if you need to specify a local file try the following:

File configFile = new File("example/config.xml");
config.addResource(new Path("file://" + configFile.getAbsolutePath()));

score 2 · Answer 13 · answered Jan 09 '20 at 12:06

2

This is not related to Flink, but I've found this issue in Flink also.

For people using Flink, you need to download Pre-bundled Hadoop and put it inside /opt/flink/lib.

answered Jan 09 '20 at 12:06

David Magalhães

750
4
10
27

score 1 · Answer 14 · answered Mar 30 '17 at 00:24

It took me sometime to figure out fix from given answers, due to my newbieness. This is what I came up with, if anyone else needs help from the very beginning:

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object MyObject {
  def main(args: Array[String]): Unit = {

    val mySparkConf = new SparkConf().setAppName("SparkApp").setMaster("local[*]").set("spark.executor.memory","5g");
    val sc = new SparkContext(mySparkConf)

    val conf = sc.hadoopConfiguration

    conf.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
    conf.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)

I am using Spark 2.1

And I have this part in my build.sbt

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}

score 1 · Answer 15 · answered Aug 21 '17 at 13:27

1

Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://nameNode:9000");
FileSystem fs = FileSystem.get(conf);

set fs.defaultFS works for me! Hadoop-2.8.1

answered Aug 21 '17 at 13:27

Asran Deng

11
1

score 1 · Answer 16 · answered Feb 19 '18 at 07:01

For SBT use below mergeStrategy in build.sbt

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
    case PathList("META-INF", "services", "org.apache.hadoop.fs.FileSystem") => MergeStrategy.filterDistinctLines
    case s => old(s)
  }
}

score 1 · Answer 17 · answered Mar 26 '21 at 10:50

This question is old, but I faced the same issue recently and the origin of the error was different than those of the answers here.

On my side, the root cause was due to hdfs trying to parse an authorithy when encountering // at the beginning of a path :

$ hdfs dfs -ls //dev
ls: No FileSystem for scheme: null

So try to look for a double slash or an empty variable in the path building part of your code.

Related Hadoop ticket: https://issues.apache.org/jira/browse/HADOOP-8087

score 0 · Answer 18 · answered Jan 08 '16 at 10:10

Use this plugin

<plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>1.5</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>

                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <shadedArtifactAttached>true</shadedArtifactAttached>
                            <shadedClassifierName>allinone</shadedClassifierName>
                            <artifactSet>
                                <includes>
                                    <include>*:*</include>
                                </includes>
                            </artifactSet>
                            <transformers>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>reference.conf</resource>
                                </transformer>
                                <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                </transformer>
                                <transformer 
                                implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer">
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

Peluo · Answer 19 · 2017-06-06T15:52:33.093

0

If you are using sbt:

//hadoop
lazy val HADOOP_VERSION = "2.8.0"

lazy val dependenceList = Seq(

//hadoop
//The order is important: "hadoop-hdfs" and then "hadoop-common"
"org.apache.hadoop" % "hadoop-hdfs" % HADOOP_VERSION

,"org.apache.hadoop" % "hadoop-common" % HADOOP_VERSION
)

edited Jun 06 '17 at 15:52

answered Jun 06 '17 at 15:11

Peluo

11
3

score -1 · Answer 20 · answered Nov 28 '17 at 12:42

I also came across similar issue. Added core-site.xml and hdfs-site.xml as resources of conf (object)

Configuration conf = new Configuration(true);    
conf.addResource(new Path("<path to>/core-site.xml"));
conf.addResource(new Path("<path to>/hdfs-site.xml"));

Also edited version conflicts in pom.xml. (e.g. If configured version of hadoop is 2.8.1, but in pom.xml file, dependancies has version 2.7.1, then change that to 2.8.1) Run Maven install again.

This solved error for me.

hadoop No FileSystem for scheme: file

20 Answers20

Why this happened to us

How we fixed it

Update: the correct fix

Linked